Saturday, February 18, 2012

Site performance - analyze and improve

Looking back at the Drupal site I built two years ago for my current employer it's starting to look a little old and worn. Sure, we're running Varnish, we have purge rules, we have memcached for the backend and we've shut off basically all cpu hogs. But what about the visitors side - apart from getting a speedy response from our cache servers, there has to be more we can do? Well - load times, page rendering speed and total data size seem like good targets for improvement since they also provide (via Firebug, YSlow and NewRelic) great metrics.

Record, improve, measure, profit? 

We start initially by collecting metrics on the site usage so we can set goals for the improvements. We use a mix-and-match approach to this, utilizing tools like Yahoos YSlow, Pingdoms Pagetest, New Relic and Firebug. Mind you, we had to disable a few rules in YSlow since they're really not applicable for us - the CDN rules (hey, we don't have a CDN), ETags (we have them but they're default Apache - YSlow hates them), Expires headers (yeah we don't want to set them 3 years into the future) and DNS Lookups (Friggin external services are really DNS-expensive)

The pages we're measuring typically show one full article, 3 banners (flash/jpg/whatever) and up to 150 teasers consisting of an image and up to 200 characters text. Insane amount of teasers.

Initial metrics
Total load time: 8sec
Waiting for external services: 3-12 sec
Data size: 2.5mb
DOM Elements: 1377
YSlow score: D (67)


Changes

1) GZIP GZIP GZIP
Apparently, somewhere in the past we had to turn the mod_deflate off and never brought it back online. Bad robot! We turned it back on and are now gzipping css, js, html and anything text-based which resulted in the total data size dropping to about 2.1 megs. Still a shitload of data though.

2) Lazy-loading of images
Instead of loading 150+ images from the get-go, we only load what's visible in the visitors browser. Anything else is loaded when it scrolls into view. This was the major improvement, bringing us down to about 1.5 megs. Still not slim enough!

3) Reduction of DOM elements
As we've progressed through the last two years, the site has been extended and fixed and extended again - resulting in a patchwork of DIVs and css classes. Some parts were done through Semantic Views and other parts through their own .tpl files. We went through and made sure all parts are rendered from their own .tpl so we have full control over the output. Then we got started reducing and simplifying the files, swapping out wrapper DIVs and extraneous containers. This brought us down to 1044 DOM elements and 1.1mb in size

4) Minimizing external services
We used to have a GooglePlus button on the site. It typically took anywhere between 3 and 10 seconds to load all of it's resources. Need I say it's gone? We also used to have the Facebook activity stream with Faces enabled. It required between 50 and 70 http requests to load all it's resources and took up to 8 seconds to load. No more. We also used to load the ever-ubiquitous JQuery from Googles ajax-cdn to be sure we always had the latest version on the site. I guess we can live with having it lag a version or two if that's the price we have to pay to have a faster site so we cached it on our own servers instead.
We still have a Twitter button and a few Facebook boxes on the page but we'll have to live with them until we develop a better method to load them (preferrably they will be loaded when the user hovers over the boxes)


Result

Final metrics
Total load time: 2 sec (-75%)
Waiting for external services: 2-3 sec
Data size: 1.1mb (-56%)
DOM Elements 1044 (-24%)
YSlow score: B (89)

So here we are. The new site is now a lot slimmer, nearly ready for Beach 2012. Stuff that's left to consider is trimming of the master CSS, adding the responsive parts to the CSS and trimming away some of the extraneous Drupal CSS classes which it so loves to add. All in all, I think the result is quite good but surely there are a some more optimizations left to do. Or at least I hope so, this performance hunting is addictive!


Other possible optimizations

A couple of points that could be worth researching are
  • Progressive loading of all content, not just images
  • Dividing the posts into categories, thereby reducing the number of posts on any page
  • Minifying *everything*
  • Writing our own sharing functions that don't require hefty scripts from external sources 
  • Spreading the downloads over multiple domains to reduce browser blocking

Tools used 

Tuesday, February 14, 2012

VCL Random url part

Random is random, usually. Unless it's being fetched every 20th minute and included as a part of a site with heavy traffic. That's a lesson I learned the hard way by relying on an ad networks RSS aggregator that included one of our RSS feeds on it's pages. The feed is randomized on our end with a list of nodes that are straight out of Views via some custom theming. The nodes are grouped in threes from a larger set.

What I noticed was that over 24 hours, what should have been an even distribution turned out to be heavily weighted towards one specific set. The ad networks RSS aggregation ran on a schedule and as bad luck would have it, Views liked to display the same set every time the RSS aggregator made a visit. Probably just bad luck but to the customer it looked like one set of nodes were favored over the others.

As I run Varnish for caching I started thinking about if I could do this in another way. I thought about faux round-robin directors and complicated solutions involving dns aliases and htaccess rewrites.

Then I stumbled upon this blogpost by Chris Davies who is just generally speaking one of the most knowledgeable techies I know of. So enter inline-C for my VCL.

sub vcl_recv{
    if (req.url == "/randomizeme") {
C{
char buff[5];
sprintf(buff, "%d", rand()%4+1);
VRT_SetHdr(sp, HDR_REQ, "\017X-Distribution:", buff, vrt_magic_string_end);
}C
        set req.url = "/randomizeme/" req.http.X-Distribution;
    }
 }

It works like this:
- Varnish picks up the request on url /randomizeme
- Varnish executes the inline C
- Varnish sets a special header with the randomized integer (the header works as a variable)
- Varnish then rewrites the url to /randomizeme/1-4
- Varnish finally fetches the above url from the backend which is rigged to display different content depending on which "slot" is chosen.
- Varnish delivers the randomized content to the visitor/aggregator/whatever

A little warning: If you modify the header name "X-Distribution", note that the "\017" must be updated, it's octal for the length of the string. I forget if it should include the ":" or not. Trial and error, watch that log for segfaults!