Many people have heard the term proxy server but do not know how it can benefit them. We’ve all heard of firewalls, and we all know the value of a good firewall. Some of you might even be using a firewall proxy to make life easier. For those of you who are unaware of what a proxy server or firewall proxy is, let me first explain the concept.
A proxy server is a program that accepts requests from a client, such as a Web browser or FTP client, and forwards the request to the appropriate Internet server. It’s transparent to the end user beyond the initial setup of the proxy. This method is often used when a number of computers sit physically behind another computer, which is in turn connected to the Internet. This configuration allows all of the computers simultaneous access to the Internet through one specific computer, typically the firewall or proxy server.
Linux has a very unique, high-speed, and top-quality proxy server. Its name is Squid and it, like most applications under Linux, performs an amazing task for no monetary cost. However, beyond this, Squid is also a caching program and is thus referred to as a proxy-caching server.
So what exactly does this mean, and how does Squid differ from other traditional proxy servers? Well, Squid performs the same function as any proxy server. It takes requests from client programs and forwards them to the appropriate Internet server. It also stores a copy of the returned data to an on-disk cache. This means that if the same data is requested multiple times, Squid returns the cached data as opposed to initiating a connection to the Internet server. This is where Squid‘s speed truly becomes visible. Think of it as being similar to your browser’s cache. Netscape, Internet Explorer, and virtually every other browser keep an on-disk cache as well, which they will refer to in the event that a Web site cannot be reached or if you recently visited it and the data hasn’t changed. Instead of using a cache solely for one client program, Squid provides a cache for all clients, regardless of their number.
One last thing to note about Squid is that it is a Web-only cache. This means that it speaks only the HTTP protocol. It will not cache multimedia items such as RealAudio, FTP, or anything other than HTTP requests. Squid can cache FTP requests if they’re made with the HTTP protocol, but most FTP clients don’t do this. It can also cache WAIS and Gopher information if it is using the HTTP protocol and, of course, it can cache secure SSL transactions using the HTTP protocol.
To this end, using Squid in conjunction with a regular firewall proxy as opposed to using it instead of a regular firewall proxy is encouraged. Squid‘s primary benefits are its speed and the fact that it caches data. Beyond this, it isn’t a firewall replacement. If you use Squid, you should still use your regular firewall.
Installing Squid
Squid can be installed two ways. You can compile it yourself, or you can install the RPM or DEB package for your particular distribution. You should run Squid as an unprivileged user, meaning that it should never run solely as root. Many people choose to run Squid as the user nobody with the group nogroup; however, it is highly recommended that you run Squid as its own user and group, meaning that it should run as user squid and group squid. The choice, of course, is entirely up to you.
The first thing you need to do is download either the source code for Squid or the RPM or DEB package. If you use RPM, you may need to do nothing more than run
rpm -ivh squid-2.3.STABLE2-3mdk.i586.rpm
on a Linux Mandrake 7.2 system. The version and release numbers may differ, depending on your distribution. The latest stable version of Squid is 2.3 STABLE4. The latest development version is 2.4 DEVEL2. Unless you have a pressing desire to install development software, you should download the latest stable version. Connect to the Squid Web site and download the file squid-2.3.STABLE4-src.tar.gz and save it on the machine where you’ll be making your proxy cache.
Unarchive the file into your /usr/local/src tree by using the following commands:
cd /usr/local/src
tar xvzf squid-2.3.STABLE4-src.tar.gz
This will create a subdirectory called squid-2.3.STABLE4, where all of the source files are contained. By default, Squid installs into the /usr/local directory tree, but I prefer having it integrated into the system a little bit more than by using /usr/local, so we will install it instead into the /usr directory tree. Change to the /usr/local/src/squid-2.3.STABLE4 directory and run configure like this:
./configure –-prefix=/usr –exec-prefix=/usr –bindir=/usr/sbin –libexecdir=/usr/lib/squid –localstatedir=/var –sysconfigdir=/etc/squid –enable-snmp –enable-heap-replacement
This code, of course, is one long line and not separate commands. What we are doing here is telling Squid to install into the /usr directory tree and place the binaries into /usr/sbin. We use /var for all caching data and /etc/squid for the configuration files. We place everything else that belongs to Squid (icons and so forth) into /usr/lib/squid.
The —enable-snmp command turns on the Simple Network Monitoring Protocol (SNMP) functions in Squid to allow you to use an SNMP-aware monitoring utility to view your proxy server. Finally, the —enable-heap-replacement command tells Squid to use various cache-replacement algorithms that are more efficient than the standard LRU algorithm.
The next step is to do the actual Squid compiling:
make
Depending on your machine, this may take a few minutes.
Once this is complete, install the package using
make install
This will install all of the Squid components to their appropriate locations.
The next thing you want to do is create two directories where Squid will store its caching data and log files. To do this, run the following commands:
mkdir -p /var/spool/squid
mkdir -p /var/log/squid
If you will be using Squid as the user nobody and the group nogroup, change the following chown command to suit. Prior to performing the next command, however, you must create the user and group using the useradd and groupadd commands:
chown squid.squid /var/spool/squid
Now Squid is installed and is ready to be configured.
Configuring Squid
It’s time to configure Squid. To do this, you must change to the /etc/squid directory. This directory will contain a few files, but the most important file—and the one we will be looking at—is the squid.conf file. Open squid.conf in your favorite editor.
Squid will theoretically operate just with all of the defaults enabled. If a keyword is missing, Squid will use the default value for it, so you can even run Squid with a zero-byte squid.conf file (but I wouldn’t recommend it). Tweaking the squid.conf file will help you get the most out of Squid. Because there are so many options in the configuration file, we’ll be looking at only the more important keywords. Feel free to explore the squid.conf file and make any changes you see fit above and beyond those pointed out in this Daily Drill Down.
The first item to configure is the port that Squid listens to. Squid uses the keyword http_port to define what port to listen to and optionally what domain name or IP address to bind to. You can specify the port alone, or you can specify the port and hostname/IP address using the syntax [hostname|IP]:[port]. The default port is 3128, so we’ll define it thusly:
http_port 3128
You can specify the cache’s memory size with the cache_mem keyword. This keyword defines the amount of memory Squid uses for in-transit objects, hot objects, and negative-cached objects. These objects are pieces of data, such as graphics, sound files, or Web pages. Since all data for the objects is stored in 4-KB blocks, your specification must be a multiple of 4 KB. The default cache size is 8 MB, and that’s a reasonable amount of memory to allocate to the cache. Note that this is not the total amount of memory Squid will ever use. This is merely for the cache, but the process itself will require more memory. On a higher load server, the size defined here may be a third or half of the overall process memory usage. In other words, if you define 8 MB, be prepared for Squid to use up to 24 MB of memory under higher load situations. Here’s an example of the command to specify the cache’s memory size:
cache_mem 8 MB
The next item you will want to configure is the maximum_object_size keyword. This keyword defines the maximum size of objects that will be saved to disk. Any object larger than this definition will not be saved. This value is specified in kilobytes, and the default value is 4096 KB, or 4 MB. There is a trade-off here, however. If you want to increase your speed, keep this value low. If you want to conserve bandwidth usage, make this value higher. Here’s an example of the command to specify the maximum size of objects to be saved to disk:
maximum_object_size 4096 KB
Squid also caches Fully Qualified Domain Names (FQDNs). This means that it initially does a DNS lookup on the site to connect to, but afterwards it uses the cached data to determine the IP address for a remote host. This can save you some time with slower DNS servers, especially on sites that are accessed frequently from your network. The default number of entries cached is 1024, but you may increase or decrease this value as required. Here’s an example of this command:
fqdncache_size 1024
The next very important keyword is the cache_dir keyword. This defines the location of your cache directory, its size, and the number of directories that may be nested below it. You can specify multiple directories so that you can have your cache span multiple disks, if you like. These directories must be owned and writeable by the Squid process. For example:
cache_dir ufs /var/spool/squid 100 16 256
This is Squid‘s default. It tells us the directory type, ufs. If you’ve enabled Asynchronous I/O via the ?enable-async-io option in configure, you might use asyncufs instead. However, we did not enable this because Async I/O support is problematic and buggy. So for every cache_dir you define, use a type of ufs.
The next parameter is the directory itself; in this case it is /var/spool/squid. The next parameter is the amount of disk space in megabytes that is to be used under this directory. In the above example, we specify 100 MB as a maximum, but you can change this value to suit your needs. The next parameter is the Level-1 parameter, which defines the number of first-level subdirectories to be created under this directory. The default is 16, as shown above. The last parameter is the Level-2 parameter, which defines the number of second-level subdirectories to be created under each first-level directory. The default is 256. This means that you can have a maximum of 4096 directories beneath /var/spool/squid.
You will also want to log the client request activity to a log file. Remember creating the /var/log/squid directory previously? Here we define the log file:
cache_access_log /var/log/squid/access.log
The next log is the cache-logging file, which is used to store general logging information:
cache_log /var/log/squid/cache.log
Another log that is generally not as useful as the others is the storage manager’s log file. This log shows uninteresting information such as which objects are ejected from the cache, which are retained, and the length of time they will be stored. Since most people will likely never look at this log, we disable it using the following command:
cache_store_log none
The next important keyword defines the location of Squid‘s MIME table. This should be in the same directory as the squid.conf configuration file, so the mime_table keyword definition should look like this:
mime_table /etc/squid/mime.conf
You should define a proper location for the Squid PID (Process-ID) file. On most systems, this would be the /var/run directory, so you would define pid_filename like this:
pid_filename /var/run/squid.pid
You can also specify the logging options for Squid using the debug_options keyword. The default (and recommended) value is ALL,1, which turns on debugging levels for all sections and uses the lowest logging level (1). The highest debug level is 9, but if you use this value, be prepared for some seriously large log files. Here’s an example of the command used to specify the logging options:
debug_options ALL,1
You can also log FQDNs in the defined cache_access_log by using the log_fqdn keyword. This keyword has two values: Off and On. If you enable it, you may experience some increase in latency for connections because Squid does a DNS lookup of each IP it connects to. Since we don’t want our browser to run any more slowly and we don’t really need this information, we’ll keep it at the default of Off:
log_fqdn off
Another option that may be of use is setting the default anonymous login password for FTP servers. It might be prudent to change the default of Squid@ to something a little more meaningful, such as mailto:?, as shown here:
ftp_user mailto:someone@mydomain.com
By default, Squid does all of its DNS lookups using the DNS servers defined in the system’s /etc/resolv.conf file. In some situations, you may want to change this setting, so you can set the dns_nameservers keyword like this:
dns_nameservers 10.0.0.1 192.168.5.25
You can list more than one DNS server with this keyword.
In order for Squid to run properly, you must run it under an unprivileged user and group. Previously, we discussed using the user squid and group squid. If you choose to do this, use the following keywords:
cache_effective_user squid
cache_effective_group squid
When you start Squid, you should start it as root. Once it has started, it will change its uid and gid information to that defined in the cache_effective_user and cache_effective_group keywords. This is typically the way a number of servers (such as Apache) run. Remember, the specified user and group should exist on the system prior to starting Squid for the first time.
Now that we’ve set the keywords, we’ll take a look at the access controls that Squid provides. While there are a host of other configuration options, they are primarily used for fine-tuning. Squid‘s default squid.conf is heavily commented, so between it and the FAQ you should be able to configure it further.
Access controls
Squid provides a powerful method of controlling Web access. With it, you can control who gets to visit which sites, at what times they can visit them, and what ports can be connected to. And that’s just the tip of the iceberg. Let’s take a quick look at the access controls that Squid provides.
Access controls are defined with the following syntax:
acl [aclname] [acltype] [string1] [string2] …
The aclname is the name of the access control you’re defining. The acltype determines the type of access control and can be one of the following: src (source), dst (destination), srcdomain (source domain), dstdomain (destination domain), url_pattern (a regular expression for the URL), urlpath_pattern (a regular expression for the URL path), time, port, proto (protocol), method (for forms: GET, POST, etc.), browser (regular expression for the browser type), and user (the user’s name). The strings are regular expressions to match against. Squid‘s regular expression matching is case-sensitive, unless otherwise specified with the -i option. Let’s take a look at a few examples:
acl weekday MTWHF 09:00-17:00
This permits proxy access only on weekdays and only between 9 A.M. and 5 P.M. Each day of the week starts with its respective letter except for Thursday (H) and Saturday (A).
acl all src 0.0.0.0/0.0.0.0
This defines the ACL all, which permits requests from clients at any IP address.
acl subnet 192.168.0.0/255.255.255.0
This defines the ACL subnet, which permits requests from the 192.168.x.x private network.
acl localhost src 127.0.0.1/255.255.255.255
This defines the ACL localhost, which permits requests only from the cache machine itself.
acl SSLports port 443 563
This defines the ACL SSLports, which permits requests only to ports 443 and 563.
acl Safeports port 80 21 443 563 70 210 1025-65535
This defines the ACL Safeports, which permits requests only to known privileged HTTP or FTP ports and any unprivileged port.
After defining your ACLs, you need to make them useable. The http_access keyword determines this and uses the following syntax:
http_access [allow|deny] [![aclname]] …
For example, building upon the ACLs we defined previously, you might use the following:
http_access allow subnet weekday
http_access deny !Safeports
http_access allow localhost
http_access deny all
This series of commands allows the private network defined in the ACL subnet access during the time period specified in the ACL weekday. It also denies any requests to a port other than those defined in the Safeports ACL. It allows access to the localhost, and finally it denies access to the all ACL.
Squid reads the http_access keywords from first to last. If none of the keywords match, the default is the opposite of the last line in the list. If the last line was a deny statement, then the default is to allow, and vice versa. Because of this, it would be prudent to include a deny all or allow all statement at the end of the access list to avoid potential confusion.
Conclusion
Squid is a powerful program that has a remarkable, and sometimes overwhelming, amount of flexibility in terms of its configuration, as I’ve pointed out in this Daily Drill Down. While it’s primarily an HTTP-caching agent, it provides a means to save bandwidth and increase speed, and also place usage constraints on Web access through an extensive access-control mechanism. If you need a Web proxy, you won’t find a better program to handle the job than Squid. For more information on Squid, visit the Squid home page.
The authors and editors have taken care in preparation of the content contained herein but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.