Security

Save on bandwidth with the Squid proxy server

Vincent Danen explains how to set up Squid as a caching proxy, which can help you save on bandwidth.

Squid is a caching proxy for the web that supports HTTP, HTTPS, FTP and more. Its distinct advantages are caching frequently-requested pages to speed up web page load times and also reducing bandwidth by not having to re-request the same page over and over again. It can also be used as a reverse proxy to accelerate web servers by serving up cached content rather than permitting continuous hits to the web server for identical content to multiple clients.

To illustrate how to quickly set up Squid as a caching proxy, Fedora 13 currently provides a very recent Squid 3.1.4 and is easy to install:

# yum install squid
Out-of-the-box, Squid will work as a web client proxy for the local host and local network. What you want to do is edit /etc/squid/squid.conf and look for the "localnet" entries, to comment out those networks that are not on your local network. For instance, if you use a 192.168 network at home, comment out the 10.0.0.0 and 172.16.0.0 lines:
#acl localnet src 10.0.0.0/8     # RFC1918 possible internal network
#acl localnet src 172.16.0.0/12  # RFC1918 possible internal network
acl localnet src 192.168.0.0/16 # RFC1918 possible internal network

Next, start the Squid service. If you have a firewall enabled on the system, be sure to allow TCP access to port 3128.

At this point, you can test by using a command line browser on the local system by doing:

$ http_proxy="http://localhost:3128" elinks http://foo.com/

And then look at the /var/log/squid/access.log file. If the browser did not complain about not being able to connect, and the log files show activity, then you have successfully set up Squid. The logs will look something like this:

1281203766.589   2626 ::1 TCP_MISS/200 18137 GET http://foo.com/ - DIRECT/1.1.1.1 text/html
1281203767.186    595 ::1 TCP_MISS/200 4867 GET http://foo.com/skins/common/
 commonPrint.css? - DIRECT/1.1.1.1 text/css

If you were to execute the same browser command again, you would see the following:

1281204000.528    313 ::1 TCP_MISS/200 18137 GET http://foo.com/ - DIRECT/1.1.1.1 text/html
1281204000.591     60 ::1 TCP_REFRESH_UNMODIFIED/200 4873 GET http://foo.com/skins/common/
 commonPrint.css? - DIRECT/1.1.1.1 text/css

This shows you the cache at work. The initial page is loaded again, but the CSS file is sent to the requesting browser using the cached copy. The next step is to try the same from another system that would also be using the cache (you can easily use the same command line browser command if available).

If you want to have a transparent proxy setup, so that no one will know the proxy is in use and cannot circumvent it, you can easily do so by adjusting iptables rules. If your firewall system is running Linux, this is easily accomplished. Note that if you do use a transparent proxy, you cannot use authentication on the proxy. If these aren't important to you, setting up a transparent proxy is a fast and easy way to force everyone on the network to use it.

In /etc/squid/squid.conf you want to uncomment the "cache_dir" directive:

# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /var/spool/squid 7000 16 256

and change

http_port 3128

to

http_port 3128 transparent

Once these changes have been made and Squid has been restarted, you also need to change the firewall rules for your network's firewall or gateway system by redirecting all output HTTP traffic to the proxy. This can be tricky, depending on whether or not your Squid install is on the firewall system or if it's a separate system in the local network. It also depends on your firewall's software. The Squid wiki has a section on Interception (i.e. transparent proxies) and how to set them up with Cisco devices, Linux, FreeBSD, and OpenBSD.

That same wiki page also has other example configurations. Squid can be used for more than just web page caching, and there are examples there on how to use it for Instant Message filtering, using it as a reverse proxy to cache web page requests on a web server, how to set it up with various forms of authentication, etc.

Squid is very versatile and can do quite a lot. For large organizations, Squid offers a surprisingly easy way to save on bandwidth, as well as provides an easy way to force authentication to be required in order to obtain outbound access to traffic. For simple web caching, Squid is pretty much ready to run as-is, and the wiki offers a lot of examples and help if you need to consider something a little more complex.

Get the PDF version of this tip here.

About

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

7 comments
ganesh_palav
ganesh_palav

Webmin is handy tool to manage Squid through web user interface.

rleongm
rleongm

We have a FORTIGATE firewall without cache. I would like to add a squid transparent proxy to the firewall.Because the disk sold by FORTIGATE 310B is to expensive $3,598.94 80GB. How do I redirect traffic from SQUID transparent proxy to the firewall and vice versa? Having the proxy infront of the firewall.

oldbaritone
oldbaritone

After the Squid proxy is running, bandwidth requirements can be reduced even more by redirecting requests for ads, banners and other garbage to a local file. We redirect ad banner requests to a locally-hosted 1x1 GIF image, which is 38 bytes and doesn't use any internet bandwitdh. SquidGuard has many configuration options, and many available FREE databases of websites, URLs and domains. It is called from Squid using the url_rewrite_program directive in squid.conf Gamers may be amazed at how much more usable bandwidth is available when you can skip all of the ads!

gracedman
gracedman

Please take this as a face value question and not a disagreement. With the great increase in dynamic web content, are caching proxies as useful as they were five or ten years ago? My hunch is the answer is that they are less useful but there is still enough static content that it is still worth deploying. But that is a guess. What are your real world experiences?

oz_ollie
oz_ollie

I definitely recommend adding squidGuard - using lists for filtering, even if it is just blocking porn at a workplace. It is easy to use with free daily updates from places like http://shallalist.de. You can also use a whitelist so that only an approved list of web sites can be accessed.

techrepublic-com-com
techrepublic-com-com

Hello Gracedman, you asked the most important issue regarding the dynamic content which is almost every website currently running. I run an ISP in Dhaka, Bangladesh. My real world experiences on saving bandwidth is very nice. Still now by tune Squid Configuration, I can save almost 40% of HTTP bandwidth!!!. Yes, this is really true. But to do so, you have to dive deep on Squid Configuration, understand user's behavior, Pattern of using etc etc. Hope you get the answer. Noor Ahamed Bauani Chief Technology Advisor Dhaka Wireless http://www.dhaka-wireless.net/ An IPv6 Ready ISP in Bangladesh, Need an IPv6 Connectivity? Just Knock us! HP: +880-1818-BAUANI (SMS Only, No Direct Call Please)

oldbaritone
oldbaritone

Yes, there is a lot of dynamic content, but even within the dynamic content there is a lot of static content also. You're probably correct that it's not as big an impact as it was five years ago, but it still may make a substantial difference, depending on the pages being loaded. Much of the imagery and "fluff" blocks - aesthetics, banners, and so on - are static content on a dynamic page and may be loaded from cache rather than re-loaded every time. Many large high-volume sites have even added domains for their static content (xyzstatic.domain) so it's easy to identify. You can also benefit from blocking domains, or redirecting them with a program like squidGuard. I use a list of adware sites on the proxy, and redirect all those requests to a locally-hosted 1x1 gif image. Pages load much faster, and the users are happier because there are FAR fewer pop-ups, banner ads and junk.