The purpose of this post is to spread knowledge about the bolts and gears of the Magento cache system among developers, and to share one method of overcoming some limitations of the file cache storage class.
This article basically started with the site of a client who was having performance issues: when the cache reached about 38000 records, he was actually forced to clear the cache to keep the site responsive enough.
How strange is that? Shouldn’t a full cache give a better performance then an empty cache?
The number of records stored in the Magento cache depends on many factors, amongst others the number of store views, if a full page cache is used or not, and the size of the catalog. Many Sites don’t go above 1000 or 2000 cache records, but for large instances much higher values are common.

The performance issues of my client only occurred on specific pages, one of the most noticeable being the add-to cart action. Dropping a product to the cart took up to 4 seconds!
Using some profiling, we could pin the issue down on part of the (full page) cache being cleared, more precisely, when the mini-cart in the page header had to be rebuilt.
Lets have a look at how the Magento caching (which uses the Zend_Cache library) works.
Each cache entry consists of the following information
- the cached data
- a cache key (or ID), that uniquely identifies this entry and is used to retrieve the data from the cache
- a cache lifetime, after which the cache entry expires
- zero or more cache tags
Most of these are obvious, but what is the purpose of cache tags? Cache tags are used to segment the cache for partial deletion. For example, you could clear all entries of the configuration cache, without touching the any entries of the HTML block cache.
On the cache management page most of the cache tags used by Magento are listed. Depending on the modules and extensions installed there could be more or less tags, e.g. CONFIG, LAYOUT_GENERAL_CACHE_TAG, BLOCK_HTML, TRANSLATIONS, FULL_PAGE_CACHE, …
All the information associated with the cache record is saved in the cache storage every time the method Mage::app()->saveCache() is called. And to read cache records the Method Mage::app()->loadCache() is used.
Magento offers several different options what to use as a cache storage, and each of these storage systems is used by means of a PHP class called “cache backend”.
By default, cache data is stored in files (located in the directory var/cache/). Another option is the database, which is slower then the files option mostly because the database is a heavily used resource for Magento instances anyway. Then there are storage schemes that use RAM (which are much faster), e.g. APC (shared memory) or memcached (a networked caching server).
So, lets use RAM as our cache storage, right?
Sounds good, except for one problem: APC and memcached only support storing simple key-value pairs, so the cache tags are lost! This renders the whole caching rather useless, because every time we need to clear only one part of the cache, EVERY cache entry is cleared.
But, do not despair! The Zend Framework contains a solution to this problem. There is a special cache backend called Twolevels. The Twolevels backend uses a fast cache backend (i.e. APC or memcached) for the cache data, and a slow backend (i.e. files or database) for the lifetime and the cache tag information. This way we can have the best of both worlds!
According to the excellent performance whitepaper (login required) Magento has released, it is recommended to use APC for the fast backend and files for the slow backend for single webserver setups. If you use a cluster of webservers, you should use memcached for the fast backend and the database for the slow backend. The latter has more overhead then the APC/files combo, but having all servers share one cache pool makes up for that.
Magento makes it easy to configure all this by the way – have a look at the file app/etc/local.xml.additional for further information. I will not go deeper into the setup here because we have already written about this.
Now, back to the problem at hand – why is a nice, full, cache slower then an empty one?
The site in question was using APC as the fast backend and files as the slow backend.
So, lets have a look at what is happening.
Every time a product is added to the cart, the following method is called:
Mage::app()->getCache()->clean(
Zend_Cache::CLEANING_MODE_MATCHING_ANY_TAG,
array($cacheTag)
);
$cacheTag is an MD5 hash built from several parameters identifying the active customers mini-cart block cache.
So, how do the twolevel and the file cache backends handle this?
In the method Zend_Cache_Backend_TwoLevels::clean() each request with the cleaning mode CLEANING_MODE_MATCHING_ANY_TAG fetches all cache IDs matching the given tags from the slow backend using the method getIdsMatchingAnyTags(), and then removes one cache entry after the other in a loop.
case Zend_Cache::CLEANING_MODE_MATCHING_ANY_TAG:
$ids = $this->_slowBackend->getIdsMatchingAnyTags($tags);
$res = true;
foreach ($ids as $id) {
$bool = $this->remove($id);
$res = $res && $bool;
}
return $res;
break;
Lets dive in a little bit deeper and find out how the Zend_Cache_Backend_File backend finds those cache IDs.
The interesting part happens in the method _get(). Here we can see a list of all cache entries is built, and then it loops over each one, reads in the corresponding meta data file (using file_get_contents() and unserialize()) and then checks if the cache entry is associated with a matching cache tag.
This following code is slightly simplified for educational purposes:
$result = array();
$glob = @glob($cacheDir . $prefix . '--*');
if ($glob !== false) {
foreach ($glob as $file) {
$fileName = basename($file);
$id = $this->_fileNameToId($fileName);
$metadatas = $this->_getMetadatas($id);
…
switch ($mode) {
…
case 'matchingAny':
$matching = false;
foreach ($tags as $tag) {
if (in_array($tag, $metadatas['tags'])) {
$result[] = $id;
break;
}
}
break;
…
}
}
}
This code is probably fine for a couple of hundred cache entries, but after not even a day my clients Magento instance reached 40k records. Opening and unserializing thousands of files takes some time, even on a strong machine.
It turns out the file backend doesn’t scale well.
The remedy to this is obviously some kind of index, mapping the tags to the matching cache IDs.
Instead of writing that information into a file that would have to be parsed and updated, I decided to use the filesystem itself as the index utilizing directories and symlinks.
In the modified file cache backend developed for the client, every time a cache entry is written, a directory with the name of each tag is created and a symlink to the metadata file is placed into it. This gives a little more overhead during the write operation. There is no difference reading cache entries. But every operation matching some tags is a lot faster. And since Magento makes heavy use of cache tags, the effect is quite noticeable, depending on the number of records in the cache. For example, adding a product to the cart now takes less then a second on the original instance. And I’m happy to say the reports of the friends who have been nice enough to test the extension have been positive for smaller sites also.
Inspired by Marc Jakubowski’s comment below, I added a little benchmark script. Here are some results to give a more detailed idea.
UPDATE: Thanks to Collin Mollenhour’s comment below, I also added benchmarks for Redis using his Zend Cache Backend Class – it’s incredibly fast, so if you have the possibility to use Redis 2.4 or newer, go for it! For this test I used the default Redis configuration, with the server running on the same machine I was running the tests. Before using it in production I would like to write some unit tests for this new backend, but it seems to work okay.
| Records | Tags | Avg. Records / Tag | Cache Backend | Avg. Time for getIdsMatchingTags() |
|---|---|---|---|---|
| 50000 | 250 | ~700 | File | 20.71s |
| 50000 | 250 | ~700 | Symlink | 2.28s |
| 50000 | 250 | ~1500 | Redis | 0.01s |
| 6000 | 120 | ~175 | File | 2.41s |
| 6000 | 120 | ~175 | Symlink | 0.23s |
| 6000 | 120 | ~370 | Redis | 0.002s |
| 2000 | 80 | ~88 | File | 0.78s |
| 2000 | 80 | ~88 | Symlink | 0.08s |
| 2000 | 80 | ~180 | Redis | 0.001s |
The numbers for the records and tags where chosen because they roughly correspond the average cache pool size and tags on several live Magento instances of different sizes. These benchmarks where run on my laptop, and obviously the results may be different depending on the drive, file system, system memory etc. Please go ahead and test on your system.
One interesting thing about the Redis backend is that the average number of ID’s per tag is much higher then with the other backends. I’m not sure why, my guess is that it has something to do with the way Redis manages the tag hash tables. I don’t have time to check it out at the moment, but maybe Collin already knows why.
By the way, according to the PHP reference, the symlink function is no longer limited to Unix systems, but since PHP 5.3 it is also available under Windows (Vista, Server 2008 or greater). I haven’t tested this, though.
This module is provided with no warranty, you use it at your own risk. I know it is being used successfully on several sites, both as a primary cache backend and as a slow backend in combination with APC or memcached.
I invite you to have a look at it and try it out, but please start with a test instance and not your live store. You can download the module from Magento Connect or from github.
If you find bugs or have improvements, please send them in.
After installing the extension clear the config and block_html cache and visit System > Tools > Symlink Cache.

There you can see sample XML that you will have to add to your app/code/local.xml file for both variants, to use it on its own or to use it as a slow backend.
Then, after you have updated the local.xml file, clear the config cache, go to the Symlink Cache page and hit the “Initialize Tag Symlinks” button.
And that is all, your system is set up and uses the tag symlinks. Enjoy!
I would be happy to hear about other methods, ideas and experiences about improving cache performance.
Downloads
or
Originally published on magebase.com. Copyright © 2011 Magebase - All Rights Reserved.





Proud members of the
This must be fate, I and Rackspace have been struggling with our cron setup for days and unable to find a solution. Looking forward to Vinai’s thoughts here! Can you run the cron directly whilst the cache is enabled? http://mysite.com/cron.php?
Are you running the cron job as the same user that runs the web server? E.g.:
sudo -u www-data crontab -e
Hi,
My cron job is configured via directadmin and they works but if i use code in local.xml
[/code]
apc
myprefix_
Netzarbeiter_Cache_Model_Symlink
[/code]
Cron arent scheduled and dont run...
When i deleleted the code in my local.xml... cron are scheduled and runs again
When using APC with the Apache php executable, you need to add:
to your apc.ini (or php.ini) config file.
However, looking a Colins previous comment around this flag (apc.enable_cli = 1) it seems that this is not the best thing to do. The alternative is to create a cron job that will not use the php cli executable but do a curl or wget of your cron php script, ie:
curl -s -o /dev/null http://www.yoursite.com/absolute/path/to/magento/cron.phpJerome, There are several possible reasons. I agree with Colins and Roberts comments. Please check var/log/*.log for possible explanations, and that you are using the latest version of the extension, in particular that Netzarbeiter_Cache_Model_Symlink::__construct looks exactly like here: https://github.com/Vinai/Symlink-Cache/blob/master/app/code/community/Netzarbeiter/Cache/Model/Symlink.php
Just a quick update on the cron issues as it maybe of some reassurance. Opting for the curl option on the direction of our RackSpace support team this works with the original Symlink.php file and crons are now running successfully. I have now updated the Symlink.php file as encouraged by Vinai and this too is running without issue.
Hi Paul,
How and which modifications to make for the file?
My problem is APC because if i use apc in local.xml without slow backend, my cron dont work (symlink is not actived)
Best regards
Severa possibilites Jerome,
1) set apc.enable_cli = 1 in you php.in (not recommended unless your crontab runs as the apache user)
2) use the file cache backend for the cron calls with a different cache directory (fiddly to set up)
3) use the curl option to call the Magento cron.php script (see the previous comments for more info)
Jerome,
As Vinai suggests in point 3). “use the curl option to call the Magento cron.php script (see the previous comments for more info)”.
If you google “setting CRON jobs with Curl” it will provide a clearer understanding.
I asked our support team at RackSpace to set it up (cause we’re rubbish at that sort of stuff).
[...] Intégration de Symlink Cache En savoir plus… [...]
How do i Uninstall the extension??
Using the Magento Connect it generates:
PHP Fatal error: Class ‘Netzarbeiter_Cache_Model_Symlink’ not found in …..
Are you using the Compiler? Just reset the cache configuration in the app/etc/local.xml to get rid of it, then it won’t be used at all.
Dear Vinai,
I just wanted to let you know that we have released a free / OS extension that extends Magento core cache features in which we have embedded (and credited!) your Symlink module. As we are a French agency, we have translated Symlink’s strings and you are very welcomed to extract them from app/locale/fr_FR/Soon_AdvancedCache.csv to add some translation to your module!
The extension is here : http://www.magentocommerce.com/magento-connect/soon-advancedcache-une-boutique-plus-rapide-2351.html
And, also for French readers of this post, we’ve written a blog post and complete user’s guide. The whole is available here : http://blog.agence-soon.fr/magento-une-boutique-plus-rapide-avec-soon_advancedcache/
Hope this helps and may be useful.
Hi Hervé,
nice to hear you are finding the extension useful. Please be aware that in some rare cases under high load flushing the whole cache causes a race condition where new inodes are created and the clean action never finishes.
By the way, Colin Mollenhour published an alternative implementation taking the concept of indexing for the file cache meta data by using a file based index, which would avoid this issue. Also he generally improved the performance quite a lot, too.
I suggest you look into bundling this backend with your extension instead of my symlink module: https://gist.github.com/2224457
Most solutions seem to indicate using APC or memcache. I assume both of those function in the same manner as the file caching? Or how do those compare in the tests you have run?
In a few days Colin will give a presentation at Imagine regarding these (and more) cache backends. I suggest waiting and then downloading the presentation for details and benchmarks.
Vinai, do you have a link for that presentation?
Here are the slides: http://info.magento.com/rs/magentocommerce/images/imagine2012-tech-cache-showdown.pdf
[...] http://magebase.com/magento-tutorials/improving-the-file-cache-backend/ [...]
[...] travail sur Redis n’aurait pas été possible sans Colin Mollenhour et Vinai Kopp qui ont fait un superbe boulot à plusieurs niveaux : la mise au point d’un outil de tests de [...]
Thanks, got a big store with 3 views and the time to generate html just divide by 3. Great job