CSS Aggregation + Page Caching + Load Balancing = Boom

Error message

  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /usr/www/users/gdd/heyrocker/includes/common.inc).
  • Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in menu_set_active_trail() (line 2405 of /usr/www/users/gdd/heyrocker/includes/menu.inc).

A few weeks ago we launched the new redesign of NWsource.com. It was a big launch, with a whole pile of new content types, blocks and views. One of the things we did with this launch was turn on CSS aggregation. We have over a dozen individual CSS files being imported, so getting them shoved into one HTTP request was a big win.

Another thing we turned on as a part of this launch was page caching. After doing some load testing we discovered that this was cutting the time it took to return the full html for a page in half. This seemed like another big win. We were pretty confident that this was going to work out well. It was thoroughly QA'd, we had done a significant amount of load testing using our Avalanche appliance, and our launch plan was solid. Everything went up smoothly and we thought we had it made in the shade.

Over the next few days, we started getting sporadic reports of pages loading without any CSS on them. I'd go over to the person's desk and ask them to reload the page and it would be back to normal. It was vexxing me.

This turned out to be one of those situations where a bunch of things come together to make your life hell. First lets look at how CSS aggregation works. When a page is loaded with CSS aggregation turned on, an md5 hash of all the css types is created. This hash is used as the filename for the new aggregated file. Drupal then checks to see if a file with this name exists. If so, it does nothing. If not, it pulls all the css together and writes it to /files/css/<hash>.css.

The second part of the equation is page caching. When Drupal determines a page needs to be cached, its HTML is generated and saved to the {cache_page} table. The cache is cleared when nodes are saved, or cache_clear_all() is called. Our editorial staff is reasonably active during the day, so we actually see our cache cleared a few times an hour. When the cache is cleared, the aggregated CSS files are also removed.

The third part? We run NWsource on a set of three load-balanced servers, in a simple round-robin rotation. Each individual server has its own distinct storage.

So lets follow the process from the first page hit after the cache is cleared:

  1. A visitor hits the home page on server foo.
  2. The CSS is aggregated into .css and written to disk
  3. The HTML is generated with @import /files/css/.css
  4. This HTML is cached
  5. The next visitor hits the home page on server bar
  6. The cached HTML is returned
  7. The CSS file doesn't exist on this server
  8. Your beautifully designed page looks like its being rendered on Mosaic in 1994 (except the background probably isn't grey.)

Without page caching turned on, the CSS aggregation actually works great. It just recreates the CSS file on every server as needed.

In my case, this was exasperated by the fact that I had a cron running periodically that rsync'd the files directory around the servers from the box the editorial team uploads them to. So eventually even the missing CSS files got pushed to all the servers. This made debugging really difficult because the problem kept disappearing out from under me, and didn't exist on our QA server at all (because we only have one QA server.)

So having finally figured this out, what is there to be done about it? The best choice is shared storage of /files (for instance an NFS mount.) We just don't have this setup and while I might look into this down the road, I don't have the knowledge of the pros and cons to set it up myself and be confident of it.

We could just increase the frequency of our /files cron but we're still going to be left with periods where the problem occurs.

What I ended up doing was (*sigh*) hacking core, adding a directive to rsync the aggregated css files as they are created in common.inc. This is really simple to do and it doesn't open us up to windows of being busted. I don't consider this a long term solution but it got the job done and its been working well since I put it into place.

I have been spending some time wondering what a long-term functional solution would look like, something that could be written as a patch and submitted to core for inclusion. You would probably want it to be admin-configurable (to turn the syncing on and manage the servers you're synching around.) Wim Leers' CDN module contains a pluggable file transfer mechanism which could be used to move the files around. Something along these lines sounds like it could be workable. Or perhaps this is just one of those esoteric situations you just need to deal with yourself.

I wrote two chapters of this book - Drupal 7 Module Development and I co-wrote it with Matt Butcher, Larry Garfield, Matt Farina, Ken Rickard, and John Wilkins. Go buy a copy!
I am the owner of the configuration management initiative for Drupal 8. You can follow this work at the dashboard on groups.drupal.org.

I used to work at NodeOne in Stockholm, Sweden. NodeOne is the largest pure Drupal consultancy in Europe. They have built websites for clients like IKEA, SFBio, and Möbler. If you need some work done get in touch!