Industrie Toulouse: Caching

I've finally gotten around to developing a generic caching service for use on our various Zope and data driven projects. Zope has a powerful caching system, but it's targeted at Zope application items in the ZODB. Likewise, Zope's SQL Method caching is not quite tuned to what we need.

The problem is that we often do a lot of load-time computation of RDBMS stored data to build new objects and data structures. Some of these loaders get hit quite intensively, and may span many many tables and even multiple queries. Some parts of them may not require (or desire) caching. Of course, when the underlying RDBMS data (or any other used data source data) changes, the cache needs to be notified. This has been where existing solutions have failed for us. Doing a 'RAM Cache' or SQL Method cache (its own kind of RAM Cache) on an intensive loader/reader doesn't work because there is no correlation between the data that is loaded and any other piece of the system that may update that data. This can be dealt with by keeping cache times low, but there are certainly situations where that in and of itself becomes more of a system burden.

The solution I've implemented thus far, based on a suggestion from a coworker, is for cache entries to have keywords associated with them. For the most part, these keywords map to table names, but they could really mean anything. Internally, the cache maps individual keywords to CacheEntry keys, and cache entries also keep track of their keywords. This way, cache entries can be invalidated individually by key OR by keyword. (The cache can also be swept for stale entries, which does the key by key removal which still cleans the keyword mapping).

For example, a breadcrumb builder may cache its results for a particular page (identified by some unique key, in this case it might be the path to the page in question) by associating the entry with the keywords 'navigation' and 'core_metadata'. Navigation contains the structure, core_metadata contains titles and descriptions. If the site manager changes the navigation tree, the resulting code may tell the cache invalidate_keywords('navigation'), whereas changing the title may result in the call invalidate_keywords('core_metadata'). Code that changes lots of data can invalidate multiple keywords. Aside from having to maintain a common vocabulary, which is easy in the case of relational data, this keeps a nice disconnect between the code that updates data and the code that reads it, because reads happen more often and are often more complex.

The little system has been working so far. It allows for caching of arbitrary objects (it has been pointed out to me that I just - again - said what is usually my least favorite word (arbitrary!)). Of course, once one gets into cache design, all sorts of other little things start cropping up (stale data, sweeps, etc). It really makes me want a good Zope Scheduling system.