Many years ago (early 1990’s) when I lived in New Orleans, I remember walking into a resourceful and offbeat audiophile establishment where my friend and I were browsing some rare vinyl, and at the small check-out desk stood a small but well written sign. “Cash Is King! We accept cash only for payment!” This struck me as an interesting proclamation, and I am sure it has been in the vernacular for a long time, but I liked it — at the time it implied a certain rebelliousness. Turns out that the small establishment was not in business long, but the “Cash Is King” decree has stuck out forever in my mind, and that phrase is the inspiration for the title of this piece, “Cache is King”. Website cache that is! This post is a high-level overview of website caching, and it will be followed-up with several more posts that delve deeper into the inner workings of website cache and administration for your organization.
Have you ever heard the catch phrase made popular by Steve Souders, “The fastest HTTP request is the one not made”? It makes perfect sense that any webpage that is cached would not need to be requested again, making it appear to load faster.
In his book High Performance Web Sites: Essential Knowledge for Front-End Engineers, Souders delivers 14 specific rules that he says can cut your webpage load time by 25% to 50%, and he has distilled the rules onto his High Performance Web Sites page, including links to online examples. Several other rules include “Making Fewer HTTP Requests”, “Use a Content Delivery Network”, and “Add an Expires Header”, just to reference a few of the tenets.
Why cache your website?
Web cache lies between the web servers and the clients and waits for HTTP requests including pages, images, files, or any object, for that matter, and then saves a copy. As other requests are made for the same objects, the cache will use the stored copy, sidestepping the web server of origin. Caching your websites improves the overall perceived performance for the clients and users. Three main reasons to add caching for your websites are to reduce latency, reduce network traffic, and reduce server load. A request from cache will be closer to the client as opposed to the original server, assisting the reduction of load times to display content, which in turn, makes the website appear to be more responsive, and keeps expectancy at a minimum.
Bandwidth consumption is reduced when a website’s presentation is reused; therefore, calling the cached version eliminates the need to refresh the original website, which reduces overall network traffic. And lastly, fewer requests made to the web server means less load to serve the web pages and other associated files and objects.
What are the types of website cache?
There are three primary types of website caching: client browser cache, proxy web server cache, and web server-side cache, and each type of caching technology has its benefits and weaknesses, depending on the type of implementation.
Client Browser Cache
Client browsers such as Chrome, Firefox, IE, etc., will cache URLs on the local hard drive for future access, typically within a directory such as “Temporary Internet Files,” where web site objects are stored, including most items that are associated with the web pages you have visited. Anytime you click the Back button on your desktop browser, you are most likely going back to a local cached version of the web page. The advantage is obvious — faster perceived loading of the web page. The disadvantage is that if content on the page has changed you will not immediately realize the updates by just accessing the cached version. Of course the client user can clear her browser cache at any time or updated her cache settings, as shown in the example dialog box from Chrome displayed in Figure B above.
Proxy Server Cache
Proxy server cache works on related principles as client browser cache, but on a greater scale, serving a multitude of users simultaneously. Typically the most requested URLs based on several parameters are stored along with their associated objects, and then these get served first anytime those URLs are requested from clients on the network. Typically these web proxy servers are set up on the organization’s network firewall, or as part of a larger firewall solution. They may also use technologies including URL blocking and associated black lists or white lists for regulating or allowing certain categories of websites or specific URLs. A simple diagram of where a typical proxy server sits in relation to an organization’s network is illustrated in Figure C above. As represented in this example, the proxy server is sitting in what is also known as a DMZ, or in computer security terms, is sometimes referred to as a perimeter network.
Server-side cache reduces the load on the web server by creating a cached copy of dynamically generated pages on the server itself. Retrieving a webpage from the web server-side cache can save the time needed to serve a fresh page dynamically on the fly. However, if the data that makes up the webpage has changed, the page which has been served from the web server-side cache won’t be as fresh. For instance, configuring an Apache web server for more efficient caching would include control rules for <meta> tags, including a <meta http-equiv=”Expires”…>, programmatically setting the HTTP headers using CGI scripts or other means, and through your web configuration files such as the httpd.conf file.
As a website stakeholder within your organization, how do you control who gets to view the latest content updates, and when the website and webpage cache is refreshed? You don’t want all your visitors to view stale or outdated content, right? Striking a satisfactory balance between getting your web pages to load faster for client requests, and the number of pages and content objects that are cached is the key to good caching policies and administration. Subsequent pieces on the website caching topic will explore web caching administration, including policies, solutions, techniques, technologies, applications, and products that aid the organizational stakeholders and website administrators to manage their website’s content delivery.