We rendered the top 1 million pages on the web, tracking every conceivable performance metric, logging every error, noting every requested URL. To our knowledge this produces the first dataset that connects performance, errors, and library use on the web. In this article we analyze what the data can tell us about creating high performance web sites.
Can you do better than our analysis? We published the dataset to Kaggle, so you can crunch the numbers yourself.
Gathering the data was a matter of writing a bit of code to use Puppeteer to script Chrome, firing up 200 EC2 instances, rendering a million web pages over the weekend, and praying that you've actually understood how AWS pricing works.
HTTP 2 is now more common than HTTP 1.1, but HTTP 3 is still rare. (Note: We're counting anything using the QUIC protocol as HTTP 3, even if Chrome sometimes reports this as HTTP 2 + QUIC.) This was for the root document, for linked resources the protocol numbers look a bit different.
For linked resources, HTTP 3 is about 100x more prevalent. How can this be true? Because all the sites are linking the same stuff:
There's a handful of scripts that are linked on a large portion of web sites. This means we can expect these resources to be in cache, right? Not any more: Since Chrome 86, resources requested from different domains will not share a cache. Firefox is planning to implement the same. Safari has been splitting its cache like this for years.
Given this dataset of web pages and their load time metrics, it would be nice to learn something about what it is that makes web pages slow. We'll investigate the dominteractive metric, which is the time it takes before the document becomes interactive to the user. The simplest thing we could do is just to look at the correlation of each metric with dominteractive.
Correlations of metrics with dominteractive
Essentially every metric is positively correlated with dominteractive, except the 0-1 variable indicating HTTP2 or greater. Many of these metrics are also positively correlated with each other. We need a more sophisticated approach to get at individual factors contributing to a high time-to-interactive.
Some of the metrics are timings, measured in milliseconds. We can look at their box-plot to get an idea of where browsers are spending their time.
Box-plot of timing metrics. The orange line is the median, the box goes from the 25th to the 75th percentile.
One way to get at the individual factors contributing to a high time-to-interactive is to do a linear regression, where we predict dominteractive from other metrics. That means we assign a weight to each metric and model the dominteractive time of a page as the weighted sum of the other metrics, plus some constant. An optimization algorithm sets the weights so as to minimize the prediction error over the whole dataset. The size of the weights found by the regression tells us something about how much each metric contributes to the slowness of the page.
We'll exclude timing metrics from the regression. If we spend 500ms establishing a connection, that adds 500ms to dominteractive, but this is not a particularly interesting insight. Timing metrics are fundamentally outcomes. We want to learn what causes them.
The numbers in parenthesis are the regression coefficients learned by the optimization algorithm. You can interpret these as having units of milliseconds. While the exact numbers should be taken with a grain of salt (see note below), it is interesting to see the scale assigned to each feature. For example, the model predicts a 354ms slow down for every redirect needed to deliver the main document. Whenever the main HTML document is delivered via HTTP2 or higher, the model predicts a 477ms lower time-to-interactive. For each request triggered by the document, it predicts an additional 16 ms.
Here's a fun plot of dominteractive split by the HTTP protocol version used to deliver the root HTML page.
Box-plot of dominteractive split by HTTP protocol version of the first request. The orange line is the median, the box goes from the 25th to the 75th percentile. The percentages in parenthesis are the fraction of requests made with this protocol.
There's a tiny number of sites still delivered over HTTP 0.9 and 1.0. And these sites happen to be fast. It seems we can't disentangle the fact that protocols have gotten faster with the effect that programmers will happily consume this speed-up by delivering more stuff to the browser.
This is for the protocol version used to deliver the root HTML page. What if we look at the effect of the protocol for resources linked in that document? If we do a regression on number of requests by protocol version, we get the following.
If we were to believe this, we would conclude that moving requested resources from HTTP 1.1 to 2 gives a 1.8x speed-up, while going from HTTP 2 to 3 causes 0.6x slow down. Is it really true that HTTP 3 is a slower protocol? No: A more likely explanation is that HTTP 3 is rare, and that the few resources that are being sent over HTTP 3 (e.g. Google Analytics) are things that have a larger than average effect on dominteractive.
Let's predict time-to-interactive from the number of bytes transferred, split by the type of data being transferred.
Here's a similar regression, this time looking at the number of requests per request initiator type.
Here the requests are split up by what initiated the requests. Clearly, not all requests are made equal. Requests triggered by the link element (i.e. CSS, favicons) and requests triggered by CSS (i.e. fonts, more CSS) and scripts and iframes slow things down considerably. Doing requests over XHR and fetch are predictive of a faster than baseline dominteractive time (likely because these requests are almost always async). CSS and scripts are often loaded in a render-blocking way, so it is no surprise to find them associated with slower time-to-interactive. Video is comparatively cheap.
We haven't uncovered any new optimization tricks here, but the analysis does give an idea of the scale of the impact one can expect from various optimizations. The following claims seem to have good empirical backing:
Judging by this top 10, our browsers are mostly running analytics, ads, and code to be compatible with old browsers. Somehow 8% of web sites define a setImmediate/clearImmediate polyfill for a feature that isn't on track to be implemented by any browser.
We'll again run a linear regression, predicting dominteractive from the presence of libraries. The input to the regression is a vector X, with X.length == number of libraries, where X[i] == 1.0 if library i is present, X[i] == 0.0 if it is not. Of course, we know that dominteractive is not actually determined by the presence or absence of certain libraries. However, modeling each library as having an additive contribution to slowness, and regressing over hundreds of thousands of examples still leaves us with interesting findings.
Top 5 libraries, time-to-interactive
Bottom 5 libraries, time-to-interactive
The negative coefficients here mean that the model predicts a lower time-to-interactive when those libraries are present than it does when no libraries are present. Of course, it doesn't mean that adding those libraries will make your site faster, it only means that the sites with those libraries happen to be faster than some baseline that the model has established. The results here may be as much sociological as they are technical. For example, libraries for lazy-loading predict low time-to-interactive. This may be just as much because pages with these libraries are made by programmers who spent time optimizing for fast page loads as it is directly caused by lazy-loading. We can't untangle these factors with this set-up.
We can repeat the exercise above, but this time predicting onloadtime. Onloadtime is the time it takes for the window's "load" event to fire, which is the time it takes for all resources on the page to load. We do a linear regression in the same way as before.
Top 5 libraries, onload time
Bottom 5 libraries, onload time
Top 5 libraries, low jsusedheapsize
Bottom 5 libraries, high jsusedheapsize
Internet commentators are fond of saying that correlation does not equal causation, and indeed we can't get at causality directly with these models. Great caution should be exercised when interpreting the coefficients, particularly because a lot confounding factors may be involved. However, there's certainly enough there to make you go "hmm". The fact that the model associates a 982ms slower time-to-interactive with the presence of jQuery, and that half of the sites load this script should give us some concern. If you're optimizing your own site, cross-referencing its list of dependencies with the ranks and coefficients here should give you a decent indicator of which dependency removal can get you the most bang for your buck.