Get your FREE 30 page Developing SOLID Applications guide!

Caching For Efficiency With Zend Framework

Out Of Date Warning

Languages change. Perspectives are different. Ideas move on. This article was published on April 5, 2010 which is more than two years ago. It may be out of date. You should verify that technical information in this article is still current before relying upon it for your own purposes.

One of the things I’m always looking for is ways to improve performance with the applications I write. While a few applications are write-heavy, most are read-heavy: that is, reading the database is the predominant behavior (for example, this WordPress blog reads the database far more often than it writes to the database). Additionally, Zend Framework is (comparatively) slow at handling requests, offering a throughput of about 67 requests per second on my machine, while loading static pages came in at a whopping 750 requests per second.*

So, given this performance difference, how do we improve the performance of Zend Framework while still retaining its functionality and ease-of-use? Well, we employ caching, of course!

But not just any caching. One of the beauties of a read-heavy website, especially one that doesn’t change all that often, is that we have the ability to cache entire pages and serve them directly using our web server. In Zend Framework 1.10.0, Zend_Cache_Frontend_Capture and Zend_Cache_Backend_Static were introduced, giving us the ability to take entire pages produced by Zend Framework and cache them. This means that we get the ability to use Zend Framework and all of its framework-y goodness, while still having the ability to enjoy the performance of static HTML pages served by our webserver. Excellent.

When devising my proof of concept, however, I found that implementing these components is more difficult than it looks. This is in part because the documentation is lacking, and also in part that the documentation in some spots is wrong. But after a week of searching and a journey that consisted of reading a Jira ticket, filing one of my own, dealing with imperfect documentation, asking questions in #zftalk on Freenode, bugging Matthew Weier O’Phinney to the point where I’m sure he made a voodoo doll of me, bugging Pádraic Brady about the cache, and good old-fashioned trial and error, I’ve mastered the implementation of static whole-page caching in Zend Framework, and here is a tutorial of how to do it yourself.

Standard Disclaimer
This tutorial implements code found in the latest version of Zend Framework at the time it was written. That means Zend Framework 1.10.3. Future releases of Zend Framework may, from time to time, change the behavior of components discussed here. Any changes should be reviewed in the documentation.

Additionally, where caching is concerned, it’s never a good idea to cache authenticated pages. It’s also never a good idea to cache data that changes a lot. Finally, it’s never a good idea to cache pages that change based on inputs, like pages that you access via POST or PUT requests.

Getting Started
First things first: let’s talk a little bit about Zend Framework’s caching model and how Zend_Cache_Frontend_Capture and Zend_Cache_Backend_Static are different.

With most Zend caches, you can implement them using the factory() method – in fact, the documentation warns against doing it any other way. So, to implement a frontend file cache using an APC backend, you can do the following:

Zend_Cache::factory('File', 'APC', $frontOps, $backOps);

With the implementation of the Zend_Cache_Manager in 1.10, you can register your cache with the manager, and then access it directly from your controllers. However, if you try to implement the Zend_Cache_Frontend_Capture or Zend_Cache_Backend_Static caches in this fashion, it blows up entirely, and will ruin your day. This is because these caches (collectively known as the Static Cache) are designed to serve files directly from the webserver once the file is cached; this means two things in particular: first, the static cache’s ID is the request URI (which in turn is turned into hexadecimal to comply with Zend_Cache’s rules on IDs), and second, because in order to capture the data, the cache uses output buffering.

Therefore, implementation of the static cache is done through the application.ini file, as a resource plugin. Developers wishing to implement this cache must include the following lines in their application.ini files:

; Custom Caches (Adjustments To Default CacheManager)


resources.cacheManager.page.backend.options.public_dir = APPLICATION_PATH "/../public/cached"
resources.cacheManager.pagetag.backend.options.cache_dir = APPLICATION_PATH "/../data/cache/tags"

This means that all of the cached data is stored in the public/cached directory; all the cache’s tags are stored in data/cache/tags. These paths are by no means the only paths, but you must specify a path for both in order for the cache to work properly.

Why do you need to specify a separate tags directory? Due to the fact that files are served directly off the web server, rather than through PHP, the tags are stored separately in another cache. This defaults to a file cache, and you must specify another location for the files to be stored. The static cache utilizes an internal cache which is transparent to you in every other way.

There is one additional INI setting we must employ. In order to operate properly, the static cache employs output buffering and captures that output, writing it to disk and then serving it to the end user. Zend Framework also employs output buffering, which if not turned off, will interfere with the static cache. This was a hangup for me, since it’s not mentioned anywhere, and was something I discovered quite by accident. In order to turn off Zend Framework’s standard output buffering, we need to include the following INI directive:

resources.frontController.params.disableOutputBuffering = true

This directs the front controller to turn off output buffering, which allows the static cache to handle it.

The last thing we need to do is create the directories where the cache will store its files, and make them owned by the web server user. While the static cache will create its own directory to store the cached static files (if it doesn’t exist), the file cache will throw an exception.

At this point, the file cache is ready to go. It’s configured, we’ve created the directories, we’ve turned off output buffering, and we’re not ready to get into caching files.

Caching Output

Zend Framework now has a built in cache helper which we’ll use to cache our static content. This needs to be done in the init() method (from my tests), and should list all the actions on the page you want to cache. Your controller should look similar to this:

<?php

class IndexController extends Zend_Controller_Action
{
    public function init()
    {
      $this->_helper->cache(array('index'), array('indexaction'));
      $this->_helper->cache(array('viewpage'), array('viewpageaction'));
    }

    public function indexAction()
    {
    }

    public function viewpageAction()
    {
    }

    public function logoutAction()
    {
    }
}

The argument list for the cache plugin is simple: first, an array of the actions we’re caching, followed by an array of the tags associated with those actions. I’ve listed index and viewpage separately, with different tags, but you can tag multiple actions with the same tags, or break it out as I have. As you develop your application, you’ll want to be careful to not cache actions that are being executed on a POST request, which you can do by using the request object’s isPost() method. Also, in this example, logoutAction() is never cached; this is because we obviously don’t want to cache the results of a log out; we actually want PHP to unset the user’s identity.

Occasionally you may wish to invalidate the cache and remove old files. To do so, you search by tag. For this example, let’s purge the “indexaction” tagged files from the cache:

$this->_helper->getHelper('Cache')
             ->removePagesTagged(array('indexaction'));

The “indexaction” tagged pages will be invalidated and re-cached on the next request.

Directing Apache To Serve Cached Files
The whole point of this process is to serve the cached files at a significant performance improvement, so now we need to make some edits to our .htaccess file’s rewrite rules. The documentation’s rules are slightly incorrect, so let’s devise our own scheme.

My read-heavy sites are usually fairly simple, serving static HTML files rather than XML, OPML, or JSON. Therefore, I need only have a rule for HTML. Additionally, I want to make sure the web server only serves cached files on GET requests, so I’ll include a rewrite condition to help with that.

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{DOCUMENT_ROOT}/cached/index.html -f
RewriteRule ^/*$ cached/index.html [L]

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{DOCUMENT_ROOT}/cached/%{REQUEST_URI}\.html -f
RewriteRule .* cached/%{REQUEST_URI}\.html [L]

RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^.*$ index.php [NC,L]

These rules do a few things: first, if the request goes to the index controller with no arguments (www.example.com/) then it tries to load the index.html cached file. Second, if there are arguments it tries to load the cached file based on the request URI. Finally, for anyone used to using Zend Framework, the last five lines are the same five lines we start with in our Zend Framework default project; these we leave alone to ensure that we run the application if there is nothing in our cache to serve.

Final Notes On Caching
Now that we’ve gotten the cache set up and verified that it works, we can develop our application. However, we obviously want to avoid caching during development, so we can turn caching off by adding the following to our [development : production] section of application.ini:

resources.cacheManager.page.backend.options.disable_caching = true

And that’s it! We can now develop our application with full page caching, getting the performance of a static web server and the flexibility of Zend Framework in the same package.

Good luck!

* The benchmarks cited were performed in the following way: I used Apache Bench, with 3000 requests (none concurrently), to test a stock Zend Framework project, and a flat file of the stock index.php of a Zend Framework project. Both times the files were named with the PHP extension. I did not benchmark the performance of any Zend components. The tests were executed on the same server as the webserver. Apache was restarted after each test. As usual, the standard disclaimers apply and the point of these benchmarks is simply to illustrate a well-known fact: flat HTML files serve faster than parsed PHP code.

Learning design patterns doesn't have to suck.

Design patterns open a whole new world of possibilities. So why are you avoiding them? This brand new book will help you finally understand these wonderful programming techiques!

Learn design patterns TODAY »

PHPGangsta (@PHPGangsta) wrote at 4/5/2010 10:42 am:

Very nice post, thanks for that information. Perhaps I can implement that in one of my next projects. But most parts of my projects are so dynamic or for authenticated people only that I have to use “normal” caching (Core) with file or memcache backend.

James S (@tkstudios) wrote at 4/5/2010 9:52 pm:

Another optimization worth noting is if you’re using Zend_Db, you can use the following line to cache table metadata:

Zend_Db_Table_Abstract::setDefaultMetadataCache($cache);

Where $cache is your factory-generated cache object.

Mary Nicole Hicks wrote at 4/5/2010 11:24 pm:

You say that “Occasionally you may wish to invalidate the cache and remove old files. To do so, you search by tag”. This reminded me about a problem I had with cleaning the cache.

Cleaning up static caches has been a problem for me. Watch out for the automated “cleans” that the cache system will do every so many cache requests. If you have pages that expire, these will all require disk access. Now that you have cached lots of your requests, cleans do not happen as often. When a clean occurs it will slow down the request dramatically. The problem with this is that you will not notice for a while as it does not slow down every request. Only ones when the cleans randomly(?) occur.

You are better off disabling the Zend_Cache cleaning mechanism and scripting it into a cron job that executes it on your website’s quiet times.

I am sure with some profiling you will see what I mean. That or if have lots of static cache pages, expire your static cache pages and are on a shared webhost, look for the timeouts in your log files.

Also if Google bot can access your static cache pages, it will crawl them all one after another, just to make sure that they all expire at the same time.

Ben (@psaxde) wrote at 4/6/2010 3:47 am:

Thanks for this great post! I don’t know about this two new features yet, I’m looking forward to test it in some of my projects soon. Thanks again!

Eric wrote at 4/6/2010 7:18 am:

Nice post. Thanks for all the info.

I’ve tryied on my web page but still have a problem, do you know if there is any way to change the cache lifetime for a specific action?

Thanks.

Andy Thompson (@andytson) wrote at 4/6/2010 8:27 am:

There is a very big flaw in Zend’s Page cache class, which I reported to them and we came up with a workaround.

http://framework.zend.com/issues/browse/ZF-3076

It must not be used to cache a HTTP redirect or under any error condition as it cannot know the HTTP status that was sent, and essentially means any page is cached, and later served as HTTP 200 OK.

The workaround is to call the $cache->cancel() on these conditions manually.

However I’ve not delved deep into the new cache helper, I can’t see any explicit code that handles these conditions.

Joe Devon (@joedevon) wrote at 4/6/2010 6:31 pm:

Nice article Brandon. Paddy’s new code sounds like a nice addition. Hadn’t had a chance to try it out yet…

Mary, what I do is set very high, basically unlimited expire times, then prime the cache via cron, which overwrites cache files instead of dealing w/ cleaning out expiring caches.

Brandon Savage (@brandonsavage) wrote at 4/6/2010 7:17 pm:

To add to what you said, Joe, I’m not 100% sure that this cache even has an expiration time. Since the files are static, and meant to be cached until expired, this may not be an issue.

Pádraic Brady (@padraicb) wrote at 4/6/2010 8:10 pm:

@Andy: I’ll look into adding the cancel mechanism during the next iteration

@Eric: Cache lifetime is largely irrelevant since a static cache “hit” will bypass PHP entirely. Therein lies it’s great advantage, and great disadvantage (no auto-expiry). If you need to time expiration, you can use a cron task to monitor collections of static files based on their cached tag references and modification times.

@Mary: There’s a few ways of keeping cleaning efficient. Specific tagging (avoiding generalised tagging) can keep a tidier breakdown of the cache in the first place, esp. between time sensitive caches (must be cleaned almost immediately if data alters) and time insensitive caches (cleaning can be deferred to a cron process).

Another tip (with some extra coding), depending on your setup, is that in some scenarios you can take advantage of a light web server (e.g. Nginx) which can interact with memcached. Not all “static” caches need be stored to files though it’s the most common need on single server apps. Whenever Zend_Cache is refactored for ZF 2.0, and we have some cleanly defined interfaces (static caching uses a couple of “workarounds” which is why it seems odd when using it outside of the cache action helper), it’s something I’ll look into porting in directly for support.

Thanks to Brandon for the light shedding article! Great one, as usual.

Isaac McGowan wrote at 4/6/2010 8:16 pm:

Thank you, thank you. I spent a number of hours trying to get this to work, before calling it quits. This is a great feature, and I’m happy it’s part of the framework now. That said, it’s too bad the team couldn’t document it in an intelligible way before including it.

Good job going to the source for information that no one could have gleaned from the documentation.

Andy Thompson (@andytson) wrote at 4/7/2010 3:09 am:

Thanks Pádraic and thanks Brandon for the great post.

Its been a while since I’ve looked into page caching, and I had moved on to using frontend content servers (Nginx and mpm-worker with mod_cache) for caching.

Maybe I’ll finish the issue (ZF-3078) I opened regarding bringing Cache-Control functionalitiy to the Page cache.

Isaac McGowan wrote at 4/7/2010 1:44 pm:

I ran into an issue with the rewrite rules and urls that end in a slash. For example:

http://www.somedomain.com/action/controller/param/param/

In this case the trailing slash causes the rewrite condition that checks for the existence of a cached file to fail. I fixed this by adding the following before the rule that rewrites to the cached file.

RewriteRule ^(.*)/$ $1 [NC,L]

Some may suggest that this should be a redirect instead, but I decided to use the NC because I like my urls to contain a trailing slash.

Again, thank you Brandon for this post, and thank you to Pádraic for an awesome addition to ZF.

holo wrote at 4/7/2010 4:13 pm:

Thank you for another great post. The rewrite rules work fine in .htaccess file but not when in a virtual host.

Cheers
holo

Paul Cook wrote at 4/18/2010 12:10 pm:

Exactly what I needed at the right time. Thanks, Brandon, for creating a straight forward and frankly dead simple tutorial for static caching. I had started wading through the docs and some other blog posts on ZF caching and was getting a little worried I was not going to be able to implement what I wanted. This article saved my day!

Evan K wrote at 4/18/2010 11:31 pm:

Definitely useful for really simple applications…However, since it relies solely on the REQUEST_URI, I wonder if there’s a way to modify it to work with subdomain-based routing (with a subdomain maps to a module)? I’ve only started digging into this, but it doesn’t look like it will be an easy thing to get working.

Hoang Nguyen (@kentismyname) wrote at 9/6/2011 5:53 am:

Thanks for your useful article,
I tried and get the cache feature to work but the .htaccess rules seems not work.

My url structure is domain.com/cat_name/article_name.html

Cached folder stored the same structure as domain one
-cached
–cat_name
—–article_name.html.html
–cat_name_1
—–article_name_1.html.html

I tried to mess up with htaccess rule but failed. Could you point me anything to make it work ?