Years and years ago, somebody coined the term “slashdotting.” In essence, it’s the experience when your website is linked to by a larger website, overwhelming your servers with a crush of traffic.
Hacker News can have this effect, especially if your website reaches the front page for any length of time.
Two weeks ago, I was featured on the Hacker News front page for nearly six hours. In that time I received thousand of hits to my website. The initial crush took down my web server. This is my story, and how you can avoid a similar fate.
I use ServerGrove with a 2.5 GB virtual server to host a WordPress blog. The webserver I have installed is Apache. I know that WordPress is a terribly inefficient application; to counter the traffic I receive normally from my readers I have W3 Total Cache standing in front.
This caching application is configured to create HTML caches of the pages. It also uses Amazon S3 to host assets like CSS and images. W3 Total Cache also has a built-in object cache, which I use and store in memory.
When my post was voted up on Hacker News, it hit the front page. This sent a crushing flood of traffic to my website; I saw traffic in the 200 hits per minute range.
I quickly figured out that first and foremost the cache was misconfigured and object caching was disabled. I enabled this. With object caching disabled, WordPress had to construct many of the objects it needed when WordPress loaded.
I also detected that Apache was experiencing a high number of defunct processes. I checked my Apache configuration and I discovered that my custom configuration had been replaced during a system upgrade with a default configuration. I had 1 server with a maximum of 20 clients and 300 requests per child.
I increased this two two servers, sixty clients and 3,000 requests per client. Immediately I noticed the number of defunct processes drop, the RAM utilization rise, and the site stabilize.
For the next six hours I watched hundreds of hits per minute come in, and the site performed extremely well.
I learned a number of important lessons through this process.
- Double check your cache settings regularly. I had disabled the object cache for a reason, but I never turned it back on. This resulted in the cache being misconfigured. I assumed that since the site performed normally under normal conditions there was no problem.
- Upgrades should be double checked for configuration changes. It’s my fault that I didn’t catch the fact that an upgrade changed my Apache configuration back to the default. This was my biggest problem, since my configuration didn’t allow me to fully utilize my server to serve requests.
- Benchmarking your own site regularly can help catch problems. After this event occurred, I restored the “broken” configuration and benchmarked it against the “correct” configuration. The change was stark: less than 50 requests per second versus 150 requests per second on the new configuration. This would have been obvious if I had regularly benchmarked my site.
- Experience doesn’t prevent making mistakes. I’d been featured before, but that was not something that kept me from falling flat on my face when Hacker News came calling.
Frustrated with your company’s development practices?
You don't have to be!
No matter what the issues are, they can be fixed. You can begin to shed light on these issues with my handy checklist.
Plus, I'll help you with strategies to approach the issues at the organization level and "punch above your weight."