CSS Aggregation + Page Caching + Load Balancing = Boom

A few weeks ago we launched the new redesign of NWsource.com. It was a big launch, with a whole pile of new content types, blocks and views. One of the things we did with this launch was turn on CSS aggregation. We have over a dozen individual CSS files being imported, so getting them shoved into one HTTP request was a big win.

Another thing we turned on as a part of this launch was page caching. After doing some load testing we discovered that this was cutting the time it took to return the full html for a page in half. This seemed like another big win. We were pretty confident that this was going to work out well. It was thoroughly QA'd, we had done a significant amount of load testing using our Avalanche appliance, and our launch plan was solid. Everything went up smoothly and we thought we had it made in the shade.

Over the next few days, we started getting sporadic reports of pages loading without any CSS on them. I'd go over to the person's desk and ask them to reload the page and it would be back to normal. It was vexxing me.

This turned out to be one of those situations where a bunch of things come together to make your life hell. First lets look at how CSS aggregation works. When a page is loaded with CSS aggregation turned on, an md5 hash of all the css types is created. This hash is used as the filename for the new aggregated file. Drupal then checks to see if a file with this name exists. If so, it does nothing. If not, it pulls all the css together and writes it to /files/css/<hash>.css.

The second part of the equation is page caching. When Drupal determines a page needs to be cached, its HTML is generated and saved to the {cache_page} table. The cache is cleared when nodes are saved, or cache_clear_all() is called. Our editorial staff is reasonably active during the day, so we actually see our cache cleared a few times an hour. When the cache is cleared, the aggregated CSS files are also removed.

The third part? We run NWsource on a set of three load-balanced servers, in a simple round-robin rotation. Each individual server has its own distinct storage.

So lets follow the process from the first page hit after the cache is cleared:

  1. A visitor hits the home page on server foo.
  2. The CSS is aggregated into .css and written to disk
  3. The HTML is generated with @import /files/css/.css
  4. This HTML is cached
  5. The next visitor hits the home page on server bar
  6. The cached HTML is returned
  7. The CSS file doesn't exist on this server
  8. Your beautifully designed page looks like its being rendered on Mosaic in 1994 (except the background probably isn't grey.)

Without page caching turned on, the CSS aggregation actually works great. It just recreates the CSS file on every server as needed.

In my case, this was exasperated by the fact that I had a cron running periodically that rsync'd the files directory around the servers from the box the editorial team uploads them to. So eventually even the missing CSS files got pushed to all the servers. This made debugging really difficult because the problem kept disappearing out from under me, and didn't exist on our QA server at all (because we only have one QA server.)

So having finally figured this out, what is there to be done about it? The best choice is shared storage of /files (for instance an NFS mount.) We just don't have this setup and while I might look into this down the road, I don't have the knowledge of the pros and cons to set it up myself and be confident of it.

We could just increase the frequency of our /files cron but we're still going to be left with periods where the problem occurs.

What I ended up doing was (*sigh*) hacking core, adding a directive to rsync the aggregated css files as they are created in common.inc. This is really simple to do and it doesn't open us up to windows of being busted. I don't consider this a long term solution but it got the job done and its been working well since I put it into place.

I have been spending some time wondering what a long-term functional solution would look like, something that could be written as a patch and submitted to core for inclusion. You would probably want it to be admin-configurable (to turn the syncing on and manage the servers you're synching around.) Wim Leers' CDN module contains a pluggable file transfer mechanism which could be used to move the files around. Something along these lines sounds like it could be workable. Or perhaps this is just one of those esoteric situations you just need to deal with yourself.

Comments

I'm guessing it's in the db?

Could the aggregated CSS be stored in the db, then, too? Using the MD5 as a key.

That kind of sucks though, but since that's where the page cache is, it seems OK for that reason.

Yes the pages are cached in the db. However the only way you could store the CSS in the db is if you included it all in the page inline. This would not be happy. Drupal does it properly now with @import

what if you had the css aggregator spit out a link like /cached-css.php?key=MD5_key, and that script just fetched the CSS from the db?

or, don't edit the css aggregator code, but have those md5 urls rewrite to go through a script that does the above?

too complicated i suspect :)

errr... obviously the url rewriting wouldn't work because something needs to put the css in the db. so the aggregator would need to be changed in either case - have it spit out different links, and have it put the css in the db.

Urgh, but then the webserver would have to make a full page load, go and grab the data from the db, and send it back to the client. If it's a flat file, then the webserver can just serve it up. It'd be like doubling your webservers workload for the same amount of page views.

My CDN integration module would indeed solve this problem.

Whatever you do, you have to force the creation and serving of your CSS files from a single server.

P.S.: contact me if you are interested in deploying my CDN integration module. While I mark it as "not yet production ready", it *is* production ready for small amounts of files.

Could you not configure the load balancer to serve all CSS files from 1 server? Surely the load of wanging out a "small" CSS file wouldn't offset the load too much?

If it did you could maybe tell it to server ALL CSS from "server1" and then simply change the PHP serving split from 50:50 to 40:60 (server1:server2)...

Of course - this assumes load balancers can do this; I've never used one.

Yes our load balancer can be configured this way, but that causes us problems because many times we need to take a machine out of the rotation for various reasons (maintenance, patching, testing, etc.) and now we're relying on this one server always being up. Also it is really wise to have the machines be absolute mirrors of each other in both code and configuration. Just makes maintenance a lot easier.

Putting /files on a file server over NFS is really the way to go. Also, putting your temp dir into the shared files directory is necessary if your load balancer isn't session-aware, or you'll hit weirdness with file uploads.

You can then do 2 things:
1) Let your web servers serve the file over nfs
2) Run a web server on the file server, and rewrite requests for /files to that static web server. While I haven't actually tried 2, it seems like it may remove some of the latency and overhead of serving a file from an nfs share.

I had the exact same problem as you, I solved it by doing my own css aggregation in my theme so I didn't have to hack core. I've written up my solution here:

http://drupal.org/node/254780

Add new comment

I wrote two chapters of this book - Drupal 7 Module Development and I co-wrote it with Matt Butcher, Larry Garfield, Matt Farina, Ken Rickard, and John Wilkins. Go buy a copy!
I am the owner of the configuration management initiative for Drupal 8. You can follow this work at the dashboard on groups.drupal.org.
I work at Lullabot!. If you don't know who Lullabot is then you haven't been around in the Drupal world long have you? Come check us out!