The CDN Manifesto

I wasn’t able to attend this years Velocity conference. So I’m catching up now by watching videos that are available online.

A lot of people ask me: “why did you want to work at Fastly?”. It’s an innocent, but complex question. My answer usually varies based on the audience. The explanation I give non-technical people is most likely: “Because I want to make the internet faster”.

However, Fastly is doing much more than just making websites faster. They are much more than a CDN, they are an extension of your application or website. No longer is the CDN a black box that you just place between your origin and your audience.

A friend asked me the other day: “Give me the 10 second elevator pitch on why Fastly is better than any other CDN”. I thought for a split second and answered with:

“There are three main differentiating factors that sets Fastly apart from the rest of CDN field:

  • Real Time Log Delivery
  • Instant purging / invalidation
  • Full programmatic API interface

Other CDNs don’t have all three capabilities”.

These three items are very powerful to web developers. It’s allows you to fully control your content and gain visibility into what your users are doing.

An excellent talk by my co-worker, Hooman Beheshti, touches on these very points. His entertaining and informative talk at Velocity is a manifesto of what every CDN should be moving forward.

Don’t let the fact that this is a sponsored talk turn you off. It’s not a sales pitch at all.

Getting Lost in 302s

Web properties that have been around for a while probably have a lot of old links, dead ends, and redirects. There is a fear amongst content owners that users are not going to be able to find their site if a URL changes.

“What about everyone’s bookmarks?!” cries the content owner. The bookmark is something from the 90s web (web 1.0 if you will). Nobody uses them anymore.

This was a challenge I was up against at my previous job. It wasn’t until I illustrated the complexities and unscalability of keeping every URL around forever, did change happen.

The mobile landscape changes quickly. This shaped the url structure and was the main cause of the many redirects for the CBC’s mobile website. Over the course of a week and after digging through Apache configurations, Akamai config files, and “meta refresh” html files, the following flow chart was born.

Click for larger

Click for larger

Thankfully there were no redirect loops! However, there were some pretty serious issues. For example:

cbc.ca/mobile -> m.cbc.ca -> cbc.ca/m -> cbc.ca/m/rich

Yes. you would be redirected three times to the final URL! Not ideal, especially if you are on a mobile device!

A lot of these have since been removed. However, it wasn’t until this diagram was presented to the web developers and management did everyone realize the gravity of the situation. 

It’s true what they say: A picture speaks a thousand words. In this case: A Visio diagram improved web performance!

IPv6 and Web Performance

After reading the first 60 or so pages of Ilya’s excellent book, my mind started racing. How does IPv6 (v6) affect packet size, round trip times, and overall web performance versus IPv4 (v4)?

I decided to set up a simple test from my Windows desktop and my personal webserver hosted by Linode. Both have native IPv6 connectivity. No tunneling.

The setup:

  • Client: Windows 7 & Chrome 35.0.1916.114
  • Server: Centos 6.5 Linux (Kernel: 2.13.7) & Apache 2.2.15

I wanted to keep things as simple as possible to better understand the low level effects of IPv6 on the typical HTTP transaction. As such, I made sure to try to keep the conditions the same for both v6 and v4 requests. My test urls were:

The hostname is the same number of characters and the fetched object is exactly the same. In order to ensure that only v6 and v4 packets were being sent, I disabled support for each one in Windows before doing the test.

You can follow along the packet streams if you like at cloudshark: 

Routing

My v6 packets take a different route versus my v4 packets. This results in a lower latency for v4 traffic over the v6 traffic (average over twenty packets is 82.1ms vs. 87.4ms repsectively).

Windows doesn’t have mtr, so I used my Macbook Pro on the same network instead. [Edit: Thanks to @jpaulellis for pointing me to: winmtr.net]

v6 routing:

Blakes-mbp:~ bcrosby$ sudo /usr/local/sbin/mtr -n -c 20 -r ipv6.blakecrosby.com
HOST: Blakes-mbp                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2601:9:8480:1196:76d0:2bf  0.0%    20    1.1   1.2   1.0   1.9   0.2
  2.|-- ???                       100.0    20    0.0   0.0   0.0   0.0   0.0
  3.|-- 2001:558:82:213b::1        0.0%    20    9.7  18.6   8.9 184.2  39.0
  4.|-- 2001:558:80:14a::1         0.0%    20   12.7  12.2  10.2  14.3   1.1
  5.|-- 2001:558:80:cf::2          0.0%    20   15.7  12.9  10.4  20.6   2.5
  6.|-- 2001:558:0:f6cb::1         0.0%    20   14.4  14.4  11.8  23.3   2.3
  7.|-- 2001:558:0:f5e8::2         0.0%    20   18.1  17.1  15.4  18.2   0.8
  8.|-- 2001:559::502              0.0%    20   20.3  17.0  14.7  21.8   1.9
  9.|-- 2001:590::4516:8f75        0.0%    20   14.2  15.7  13.9  40.4   5.8
 10.|-- 2001:590::4516:8fa6        0.0%    20   14.7  15.1  14.4  16.3   0.4
 11.|-- 2001:590::4516:8e00        0.0%    20   33.0  45.6  31.5 185.6  40.2
 12.|-- 2001:590::4516:8e3b        0.0%    20   34.6  36.4  31.0  66.4   9.6
 13.|-- 2001:590::4516:8e65        0.0%    20   64.5  66.8  64.3  99.9   7.8
 14.|-- 2001:590::4516:8e4b        0.0%    20   85.5  85.0  82.0 109.0   5.9
 15.|-- 2001:590::451f:22b2        0.0%    20   84.5  87.1  82.7  95.8   4.1
 16.|-- 2001:518:1001:1::2         0.0%    20   93.5  88.0  82.9  94.5   4.5
 17.|-- 2001:518:2800:3::2         0.0%    20   84.2  84.2  82.8  86.5   1.0
 18.|-- 2600:3c03::f03c:91ff:fe6e  5.0%    20   83.0  87.4  82.8 121.1   9.7

V4 routing:

Blakes-mbp:~ bcrosby$ sudo /usr/local/sbin/mtr -n -c 20 -r ipv6.blakecrosby.com
HOST: Blakes-mbp                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.1                0.0%    20    0.7   1.3   0.7   2.5   0.6
  2.|-- 50.185.86.1                0.0%    20    9.2   9.3   8.3  10.5   0.8
  3.|-- 162.151.1.229              0.0%    20   10.8  10.0   8.7  14.2   1.5
  4.|-- 68.85.154.74               0.0%    20   12.2  13.5  10.3  38.0   6.0
  5.|-- 68.85.155.14               0.0%    20   11.3  12.7   9.7  20.7   2.6
  6.|-- 68.86.90.157               0.0%    20   14.2  14.1  11.0  25.9   3.1
  7.|-- 68.86.89.122              55.0%    20   38.6  57.6  36.8 203.4  54.8
  8.|-- 68.86.86.205               0.0%    20   39.9  43.9  38.2 102.2  13.8
  9.|-- 68.86.86.181               0.0%    20   64.9  64.2  61.9  69.4   1.8
 10.|-- 68.86.88.149               0.0%    20   90.9  83.2  79.9  90.9   2.4
 11.|-- 173.167.58.26              0.0%    20   80.6  81.8  80.5  88.4   1.9
 12.|-- 209.123.10.118             0.0%    20   84.7  85.9  81.7  93.6   4.2
 13.|-- 207.99.53.42               0.0%    20   82.4  82.8  81.0  91.6   2.3
 14.|-- 97.107.139.208             5.0%    20   83.0  82.1  80.9  84.0   0.9

DNS Requests

The main difference betwen looking up a v4 IP address versus a v6 address is the record name. v4 Addresses use an “A” record, while v6 addresses use a “AAAA” record. v6 addresses are also much larger at 16 bytes (versus 4), so the response will always be larger than a standard v4 response.

In this particular test, both v4 and v6 responses fit into a single packet. So the number of round trips are the same. The client is going to my router to do the DNS resolution using UDP. So there is no TCP handshake overhead.

 

Version # of Packets Total Size RTT
v4 2 176 bytes 0.092ms
v6 2 228 bytes 0.089ms

 

 

The v6 request is ~30% larger than the v4 request. However, round trip time will be the same (under my test conditions)

TCP Handshake

All v6 packets will be larger due to the increase in size of IP headers. v6 headers have a 40 byte size, while v4 headers only have a 20 byte header.

One thing I did notice was that the MSS size was different in the initial SYN packet from the client to the server. the v4 MSS was set to 1260 bytes, while the v6 one was set to 1460 bytes. 

The v4 SYN/ACK response from the server reset the MSS to 1460 and the same v6 SYN/ACK response reset it to 1420.

 

Version # of Packets Total Size RTT
v4 3 186 bytes 0.081ms
v6 3 246 bytes 0.085ms

 

 

HTTP GET

There is no difference between v4 and v6 when fetching the object. HTTP header and body sizes are exactly the same. 

Conclusion

IPv6 is a new version of the Internet Protocol. It doesn’t change the way TCP or HTTP behaves. Your packets (although a little larger with v6) are routed the same way. This means that it’s latency and not available bandwidth that affects v6 performance, just like with v4.

Overall you will be pushing more bits over the wire, however the number of round trips made to the server to make a simple HTTP request with v6 is the same as with v4.

 

Version # of Packets Total Size RTT
v4 12 2236 bytes 0.337ms
v6 15 2848 bytes 0.342ms

 

 

What’s this? 3 extra packets with the v6 conversation? Yes. For some reason the server decided to change the MSS to 1420 from 1460 before returning the HTTP response. After the response was sent, another MSS change from 1420 back to 1460 was made. I have no idea why. But this accounts for the 3 extra packets. [Edit: My coworker pointed out that this additional TCP handshake was Chrome being proactive with setting up a new TCP session. The browser does this exepecting to download more data when you click on a link or perform another action]

v6 will eventually be the de facto IP version used on the internet. The good news is that all of the advancements in web performance such as front end optimization and HTTP/2 won’t have to change when v6 becomes ubiquitous.

v6 should result in better web performance overall. Mainly due to the fact that:

  • v6 routers don’t perform packet fragmentation.
  • Routers don’t need to perform checksums on v6 packets (like they do on v4)
  • Routers aren’t required to compute packet time in queues

The above points may be moot now with today’s fast routers and specialized cpus. However, when it comes to web performance, every little bit counts.

Who’s using a CDN?

The HTTP Archive is a great resource for keeping track of trends in the way web sites are built. It’s shown the steady decline of Flash on websites over the years, for example. 

I decided to use the dataset to track which are the most popular CDNs. Below are my findings using the May 15, 2014 run.

image

HTTP Archive has recorded a total of thirty different CDNs. The top five used are: Cloudflare, Google, Akamai, ChinaNet, and Edgecast. Keep in mind that only 10% of all sites tracked by HTTP Archive are using a CDN at all.

The first question that came to mind was, “Google is a CDN?”. The answer: Yes. These would be sites hosted by Google Sites or Googles own properties (like YouTube). 

Both Cloudflare and Google are free, so it’s no surprise that they are the top two popular CDNs. 

One thing to note with the data. The HTTP Archive only tests where the front page HTML is hosted. So it’s not a definitive way of knowing if a particular site uses a CDN or not. For example: the HTML could be hosted at origin, but all images could be hosted by a CDN.

The HTTP Archive also keeps track of the Alexa rank of each site it tests, so we can use that to determine which CDN powers the most popular pages.

image (3)

Google takes the cake for hosting the top 100 popular sites (A little hard to see in the above graph). Akamai takes a commanding lead on hosting the remaining top 5,500 sites, followed by Cloudflare. 

A breakdown of the most popular site hosted by each CDN:

CDN

Alexa Rank

URL

Google

1

www.google.com

ChinaNetCenter

29

www.163.com

Akamai

32

www.ask.com

Incapsula

50

www.neobux.com

CDNetworks

63

www.ifeng.com

Cloudflare

129

www.canadaalltax.com

Fastly

144

www.wikihow.com

Edgecast

148

www.w3schools.com

ChinaCache

230

www.china.com.cn

Level 3

257

www.twitch.tv

Keep in mind these results are for the hosting of the front page HTML file only. A lot of sites utilize a multi CDN approach where they spread requests over more than one CDN.

So what about sites that decided to not use a CDN? You might be surprised at some of the results:

Alexa Rank

Website

2

www.facebook.com

4

www.yahoo.com

5

www.baidu.com

6

www.wikipedia.org

7

www.qq.com

8

www.taobao.com

9

www.twitter.com

10

www.amazon.com

11

www.linkedin.com

12

www.live.com

Some of these sites use CDNs to host site assets (like Facebook and Twitter).

What’s the quickest and easiest way to see if a particular site is hosted by a CDN? You can look at how the hostname resolves. Doing a dig on www.cbc.ca returns:

;; ANSWER SECTION:
www.cbc.ca.		39086	IN	CNAME	www.cbc.ca.edgesuite.net.
www.cbc.ca.edgesuite.net. 17565	IN	CNAME	a1849.gc.akamai.net.
a1849.gc.akamai.net.	18	IN	A	184.51.102.89
a1849.gc.akamai.net.	18	IN	A	184.51.102.56

www.cbc.ca uses Akamai as shown above.

This is just the tip of the iceberg. I encourage you to take a look at the data yourself. You can access the data for free using Google Big Query.

O’Reilly Velocity Wrapup

Barbara and I had the pleasure of speaking at the O’Reilly Velocity conference in Santa Clara, California last week.

This was one if the best conferences that I’ve attended. It was great to see so many smart people sharing ideas in one place.

Office Hours

Answering questions in the one-on-one “Office Hours” session

One of the best features of this conference is what the organizers called “Office Hours”. This gave you the opportunity to talk to the speakers privately about anything. This gave visitors the opportunity to “pick your brain” over ideas they may have. Barbara and I took advantage of this time to also get to know the attendees better.

Blake Crosby and Barbara Bermes

Presenting

A copy of the slides are available on Slide Share or view the slides below.

I’m planning on proposing another talk for next years event. However, I think I’ll target the East Coast this time in New York.

FITC Web Performance and Optimization

A colleague and I presented a 50 min talk at my first weekend event yesterday.

I counted approx. 70 people in attendance before we got to talking. The talk was well received and we even spent another 20 min or so chatting to individuals that wanted some more information.

The talk was about how CBC is taking web performance seriously, the tools we use to improve our websites performance from both the front and back ends. Slides are available in pdf format.

Barbara and I will be giving this talk again, this time at the O’Reilly Conference in June.

What Your CDN Won’t Tell You

Julian and I had the honour of having our paper accepted by USENIX. In fact, Julian is at this years LISA conference presenting it (I’m unable to attend due to schedule conflict).

Our years together working at CBC has taught us a lot about running a News website. Specifically around dealing with our CDN (Akamai).

How do you manage that fine line between having fresh content appear on the site quickly (what News wants) and protecting the origin from the load of a breaking news event (what the SysAdmins want)?

This paper answers that question and gives you a glimpse at how we do things at CBC.

You can read the paper here.