A surfeit of caches

A cache is useful in theory. By storing a temporary copy of some page or image, the servers can deliver it faster to the client computer. But if you've ever been stuck trying to debug a problem related to caches, you quickly realize that computer science types have gone overboard and put caches everywhere. It makes tracking down some problems very, very difficult.

I'll use my own situation from work as an example. We run an Oracle Portal as our business intelligence Web tier. There are six servers in our little farm, with each server running an instance of Portal (to deliver the HTML content and handle security) and Oracle Reports (to deliver the actual report PDF and Excel output).

As a user clicks through the Portal pages, they bounce around between servers in the farm. The mechanism for sharing the user's state and session information between all the servers is Oracle Web Cache.

So there's a load balancer sitting in front of these six servers and, when a user clicks to request a page, the load balancer is actually sending the request to the Web cache port of one of the servers. In turn, the Web cache port picks one of the six servers and grabs the content from that server's Apache port.

Let's call these six machines Report1 through Report6. When your request lands on say, Report1, it's possible the Web cache on that machine has the content and will deliver it directly to your browser. If it doesn't have the content, it has to pick which "real" content port it will hit. This means that you might be hitting Report1, but the content is actually coming off another machine such as Report3.

We occasionally get glitches when a Portal page generates; one or more of the components on the page will fail to generate and display some kind of error text instead. Now we have to figure out what's failing -- is it the Web cache farm itself; is it one particular Web cache instance; or is it one of the Apache instances sitting behind the Web cache? Fortunately, Oracle Web Cache truly acts like a farm. There's a single UI where you can go to monitor all the Web cache servers and to flush them simultaneously if necessary to clear out bad content.

It gets worse. The Portal maintains a page cache, where instead of having to go all the way back to the database, it stores page content internally so it can deliver that faster. As far as I can tell, this cache is completely unrelated to Oracle Web Cache. The Portal cache objects are stored in the Portal's memory so that clearing Oracle Web Cache doesn't actually clear Portal's cached objects. If it's the Portal throwing the error, you have to clear its cache separately from Oracle Web Cache.

Oh, and let's not forget the browser's cache. Every time we make a CSS or GIF, someone inevitably calls to complain, and we have to tell them how to clear their local browser cache.

(I'm just grateful I no longer work on a public Internet site because then there's yet another cache such as Akamai or similar distributed caching systems. I remember digging through my e-mail to find the Akamai password so we could flush the Akamai cache to get one of our CSS or image files updated throughout the Akamai cloud.)

This is why I'm not going to use ReadyBoost whenever I eventually upgrade to Windows Vista. The last thing I need is yet another cache running around in my PC.