The biggest problems on the Internet today are high traffic and slow transmission of data. If businesses are going to be competitive, Web site performance is critical. Thus enters caching technology, which goes a long way toward solving these concerns. Caching is this week’s Jargon Watch focus.
Proxy caching/Web caching/network caching
Proxy, Web, and network caching all refer to a system that retrieves an initial request for a Web site and then stores a copy of that content on a Web server closer to the end user. On subsequent requests for the same Web site, the cache delivers it from its local storage rather than going all the way back to the origin Web server.
Caching helps both content providers and end users. When data is stored (cached) on a Web server, it speeds up getting the Web page to the user, it reduces traffic over the Internet, it preserves bandwidth (because each Web site is accessed from the origin server only once), and it reduces network congestion. In addition, caching improves the quality of the transmission, because when a Web site has to go through fewer routers to get to the end user, there is less opportunity for packet loss or delays. Caching also saves money if the client is paying by the traffic.
Many large Internet companies and large corporations are setting up server farms around the world to cache their content in order to make it as fast to access as possible for international users. Web caching is common in Europe, Asia, and Australia, where North American Web sites are cached to avoid transAtlantic costs. Web caching also benefits medium-size and small businesses by speeding response time and reducing the congestion in the WAN. A proxy cache is also known as a “proxy server,” “proxy gateway,” or “caching proxy.”
Browser caching/client caching
The Netscape Navigator and Internet Explorer browsers both use caching. Browser caching is different from Web caching in that an individual user caches Web content as a file on his or her local hard drive. Caching stores not only the Web site currently displayed in the browser window, but also Web sites requested in the past. However, this benefits only the individual user. Other network users accessing the same Web site are not served, so there is no overall traffic reduction. The end user can specify the length of time a page is cached, as well as disable the browser’s caching feature. This cache is useful when users hit the Back button to go to a page they’ve already seen.
Not all Web sites can be cached. Currently, approximately 35 percent of content on the Web can be cached. Many objects on a Web page are static, so they can be cached. However, dynamic content—such as changing stock prices—is not cached. So when the Web site is requested, the static, cacheable objects are supplied from the cache, and only the dynamic, uncacheable, real-time data is retrieved from the origin server. If the object is authenticated or secure, it won’t be cached.
One problem with Web caching is that Webmasters can’t tell who is using their site when it’s accessed it from a cache. Another concern is that out-of-date content might be served up. To be valuable to users, content in the cache must stay up-to-date. Good web designers know how to configure their servers to control freshness. If the cache is HTTP 1.1 compliant, Web designers can set the attributes for caching each object on the Web page by labeling it either “uncacheable,” “okay to cache,” or “explicit time-out” (which forces the cache to expire after a certain amount of time). Also, every time the end user presses the Reload button, the content is refreshed.
A cache is generally designed to operate in a hierarchical way. Cache hierarchy refers to the way normal Web requests get filled. If the first cache in the network, located near the router that is connected to the Internet, can’t fill a request, it makes a normal request for the Web site. This request goes to the cache cluster at the main Internet access point. By fulfilling the request anywhere along the hierarchy, caching reduces traffic on the network and the demands on the Web server. A hierarchical arrangement is not the only possible configuration; some national and international caching proxies have a non-hierarchical arrangement.
Latency—the time between when a request for a Web site is made and the request is fulfilled—is one of the main reasons Web caches are used. Caching reduces the period of latency because requests are filled closer to the end user.
If you have a Jargon Watch submission, please send us an e-mail.