Open Source

Learn how to configure Apache

If you need to make configuration changes to Apache, here's some information that you won't want to miss.

Apache is controlled by a series of configuration files: httpd.conf, access.conf. and srm.conf (there's actually also a mime.types file, but you have to deal with that only when you're adding or removing MIME types from your server, which shouldn't be too often). The files contain instructions, called directives, that tell Apache how to run. Several companies offer GUI-based Apache front-ends, but it's easier to edit the configuration files by hand.

Remember to make back-up copies of all your Apache configuration files, in case one of the changes you make while experimenting renders the Web server inoperable.

Also, remember that configuration changes you make don't take effect until you restart Apache. If you've configured Apache to run as an inetd server, then you don't need to worry about restarting, since inetd will do that for you.

Download the reference card

As with other open-source projects, Apache users share a wealth of information on the Web. Possibly the single most useful piece of Apache-related information--apart from the code itself, of course--is a two-page guide created by Andrew Ford.

Called the Apache Quick Reference Card, it's a PDF file (also available in PostScript) generated from a database of Apache directives. There are a lot of directives, and Ford's card gives you a handy reference to them.

While this may not seem like a tip on how to run Apache, it will make your Apache configuration go much smoother because you will have the directives in an easy-to-access format.

One quick note--we found that the PDF page was a bit larger than the printable area of our printer (an HP LaserJet 8000 N). So we set the Acrobat reader to scale-to-fit and the pages printed just fine.

Use one configuration file

The typical Apache user has to maintain three different configuration files--httpd.conf, access.conf, and srm.conf. These files contain the directives to control Apache's behavior.

The tips in this story keep the configuration files separate, since it's a handy way to compartmentalize the different directives. But Apache itself doesn't care--if you have a simple enough configuration or you just want the convenience of editing a single file, then you can place all the configuration directives in one file. That one file should be httpd.conf, since it is the first configuration file that Apache interprets. You'll have to include the following directives in httpd.conf:

AccessConfig /dev/null
ResourceConfig /dev/null

That way, Apache won't cough up an error message about the missing access.conf and srm.conf files. Of course, you'll also need to copy the directives from srm.conf and access.conf into your new httpd.conf file.

Restrict access

Say you have document directories or files on your Web server that should be visible only to a select group of computers. One way to protect those pages is by using host-based authentication. In your access.conf file, you would add something like this:

<Directory /usr/local/apache/share/htdocs/protected>
order deny,allow
deny from all
allow from 10.10.64
</Directory>

The <Directory> directive is what's called a sectional directive. It encloses a group of directives that apply to the specified directory. The Apache Quick Reference Card includes a listing of sectional directives.

The above case allows only computers with an IP address starting with 10.10.64 to access the pages in the given directory. You can use the complete IP address, an IP range as shown here, or even use the DNS names. For example, to allow only CNET computers access to a specific file, you might do this in your access.conf file:

<Location /usr/local/apache/share/htdocs/company/employees.html>
order deny,allow
deny from all
allow from .cnet.com
</Location>

It's important to have that preceding period on the domain name, otherwise Apache allows only the computer that exactly matches cnet.com. If that's what you want, you can restrict to individual IP addresses and fully qualified domain names.

An interesting side-effect of host-based authentication is that if you're using a browser on the Web server machine itself and attempt to access the page through localhost, you'll be denied permission. That's because the localhost IP, 127.0.0.1, will not be in the .cnet.com range. You can easily add localhost to the permission list by putting the appropriate IP on the allow directive:

allow from .cnet.com 127.0.0.1

The majority of security measures you will need to take when running a publicly accessible Web site will be set at the operating system level. You will want to make sure write access is restricted in the directories where your Web pages are stored to keep visitors from defacing your site.

Customize error messages

If a user requests a page that doesn't exist or is in a protected directory, Apache returns one of its built-in error messages that say things like Forbidden or Not Found. That's accurate, but not very informative. You may want to give your users more guidance as to what they did wrong, provide an alternative URL to get them back in your site, or at least offer an error page that fits in with your overall site design. With a bit of editing, you can make Apache return a custom error page or run a script to handle the error.

Open the srm.conf file and insert the following:

ErrorDocument 404 /error.html

Your server will now return the error.html page whenever a user requests a page that doesn't exist (which is what the 404 error code means--check out the Apache Quick Reference Card for a list of other HTTP 1.1 status codes). In this example, the destination of the directive is an HTML page, but you could also point to a CGI or even a URL from a different Web site.

Unless you include a full URL, the ErrorDocument directive uses a path relative to the document root of your Web server. So in our example, error.html must reside in the Apache document root. By default that document root is /usr/local/apache/share/htdocs. Also, when Apache actually serves up this error page it does so within the context of the erroneous URL. So if a user requested a nonexistent page (http://www.dummydomain.com/one/two/none.html), Apache returns error.html as if it resided in the /one/two directory. That means you need to be careful and fully qualify any relative paths to images or other pages in the error.html file. Otherwise you might serve an error page that itself contained errors.

Support multiple languages

HTTP 1.1 formally specified a feature called content negotiation, which had actually been around for awhile in experimental servers, including early versions of Apache. It's a way to present documents in different languages and formats based on a user's browser configuration.

For example, suppose you're a Canadian company that needs to serve both French and English versions of your Web site. First, you must enable the feature by adding the appropriate directive to your access.conf file.

Open the access.conf file and find or create the appropriate <Directory> entry where you plan to store the multilanguage pages. Then add the Options MultiViews directive to that section. Remember that Options All does not actually mean all--it doesn't turn on MultiViews support. So you must explicitly declare your intention to use MultiViews. For example:

<Directory /usr/local/apache/share/htdocs/multi>
Options MultiViews
</Directory>

Next, you need to edit your srm.conf file to include the languages you want to support and the file extensions associated with each language. The Canadian example calls for English and French, which have the standard identifiers en and fr, respectively. Your srm.conf file should already have these, but if not, add the appropriate lines:

AddLanguage en .en
AddLanguage fr .fr

LanguagePriority en fr

The LanguagePriority directive is used when there's a tie during content negotiation. For example, if Apache can't tell whether the browser prefers English or French, the LanguagePriority directive tells Apache to serve the English version of the page.

For Apache to recognize which pages it should serve, you have to include the proper extension on your file names. If, for example, you want to offer a help file in two languages, you'd create a help.html.en and help.html.fr file with the appropriate language content. Then, when a user requests the http://yourdomain.com/multi/help.html file, Apache will check the browser's language preference and return the correct version.

Configure for server-side includes

If you want to take a small step beyond static HTML pages, but you aren't quite ready to dive into writing your own Perl scripts, then you should try server-side includes (SSI). With SSI turned on, Apache will preparse certain HTML files before sending them out, looking for special embedded commands. These commands allow you to do basic things like include the contents from another file or print out an environment variable.

To enable it, you first need to make sure it has been compiled into your version of Apache. Go to the directory where your httpd executable resides, typically /usr/local/apache/sbin, and type ./httpd -l. That should return a list of all the modules included in your build of Apache. Hopefully mod_include.c is in that list. If not, you'll have to rerun the build of Apache, editing the comment code from the mod_include in the Configuration.tmpl file.

Once you've determined that mod_include is available, you have to allow the execution of includes and map an appropriate filetype. As with all things Apache, there are about a gazillion ways to do this. Probably the easiest is to enable all the options in one place in your access.conf file:

<Directory /usr/local/apache/share/htdocs/include>
Options +Includes
AddType text/html .shtml
AddHandler server-parsed .shtml
</Directory>

All files in the /usr/local/apache/share/htdocs/include directory that contain a .shtml extension get parsed by Apache before being sent out to a browser.

In many instances, the AddType and AddHandler directives are already in your srm.conf file, but they're commented out. So you could uncomment those, and in your access.conf file set the Options to allow executing include commands. Note the use of the plus sign in the Options directive--that tells Apache to add this option to any preceding options settings, rather than overriding them. If you want to limit the SSI support to prevent executing potentially dangerous programs, you might want to use Options +IncludesNOEXEC.

To test your settings, create a test.shtml file like this one:

<HEAD><TITLE>SSI Test</TITLE></HEAD>
<BODY bgcolor="white">
<H1>SSI Test</H1>
File last modified <!--#flastmod file="test.shtml" -->
<P><PRE>
<!--#printenv -->
</PRE>
<P><!--#exec cmd="/bin/date" -->
</BODY>
</HTML>

Apache will attempt to parse any text that starts with a <!--#. The example uses three SSI commands--flastmod, printenv, and exec (the complete list of SSI commands is on the Apache Quick Reference Card). The flastmod prints the last-modified date for the specified file, printenv spits back a list of environment variables and their values, and exec runs the specified shell command. Note that if you've configured Options +IncludesNOEXEC, then the exec command returns an error message instead of the current date and time.

Configuring Apache for CGI

If you've pushed server-side includes about as far as they can go, you might want to try common gateway interface (CGI) scripts. CGI is a standard way for Web servers to interact with other programs running on your computer. CGI scripts are usually written in Unix shell commands or with a scripting language such as Perl.

Configuring Apache to run CGI programs isn't that hard. First, you need to assign an alias for your script directory.

You never want the directory containing CGI scripts to actually reside within the normal document root of the server because an intruder could get access and run their own scripts. So you create a special location, called an alias, to the actual CGI directory. Edit your httpd.conf file and add the line below:

ScriptAlias /cgi-bin/  /usr/local/apache/share/cgi-bin/

The example uses the default directory for CGI programs, but you're free to use any directory you want. Now, when someone requests a URL like this http://mydomain.com/cgi-bin/test.cgi, the Apache Web server knows to look in the /usr/local/apache/share/cgi-bin directory to find the test.cgi program.

However, this does not configure Apache to run the programs it finds in the cgi-bin directory. To actually execute programs, you need to edit the access.conf file by adding a section like this:

<Directory /usr/local/apache/share/cgi-bin>
Options ExecCGI
AddHandler cgi-script   .cgi   .pl
</Directory>

The Options directives tell Apache to allow the execution of CGIs within the specified directory. And the AddHandler directive tells Apache which file extensions to associate with the cgi-script handler. The example uses the two most common file extensions, CGI and PL.

If you're running Apache on Unix, make sure that the user account under which the Web server runs has permission to execute the scripts in the directory. Otherwise the OS won't let Apache run the scripts.

Avoid unnecessary file lookups

There are special files, called .htaccess, that reside within a directory and tell Apache to provide special handling for the files in that directory. For example, instead of enabling server-side includes in the access.conf file, you might specify it within the actual directory by including the directive in a .htaccess file.

By default, Apache is not configured to allow .htaccess files. If you open up the access.conf file you'll see that the AllowOverride None directive is sprinkled liberally throughout the various <Directory> sections. If you want to allow overrides, you might be tempted to change the directive at the root level, like this:

<Directory />
AllowOverride All
</Directory>

Don't do it. Whenever Apache handles a request, it would have to process .htaccess files in the same directory as the file it is serving, and also in all the parent directories up to the root. For instance, if you request the URL /docs/about.html and your document root is /usr/local/apache/share/htdocs, Apache tries to process .htaccess files in all these directories:

/
/usr
/usr/local
/usr/local/apache
/usr/local/apache/share
/usr/local/apache/share/htdocs
/usr/local/apache/share/htdocs/docs

Normally, there are no .htaccess files above the document root, but Apache still checks the file system to make sure. That's a lot of unnecessary file lookups. And if a malicious hacker had managed to place an .htaccess file somewhere in this document tree, it could pose a security risk to your site.

Instead, keep the AllowOverride None directive for your root directory, and turn it on only for the specific directories where you really want it. For example, to perform .htaccess lookups starting in the document root, you'd modify your access.conf file like this:

<Directory />
AllowOverride None
</Directory>
<Directory /usr/local/etc/httpd/htdocs>
AllowOverride All
</Directory>

The All can be replaced with whatever level of configurability you want. For example, if you want to allow server-side include overrides but don't want to allow running shell programs, you'd use something like this:

<Directory /usr/local/etc/httpd/htdocs>
AllowOverride IncludesNOEXEC
</Directory>

A final note--the .htaccess file doesn't actually have to be called .htaccess. Open the srm.conf and find the AccessFileName directive. You can change what the .htaccess file is called:

AccessFileName .my_htaccess_file

(Editor's note: A version of this tip originally appeared in Apache Week. It appears here with permission.)

Limit DNS overhead

To improve Apache's performance, when restricting access with allow from or deny from, use IP addresses where possible to limit the number of DNS lookups. Apache has to run a double lookup when using an allow from domain name or deny from domain name directive--a reverse to resolve the browser's IP address into a domain name followed by a forward to make sure that the reverse is not being spoofed.

You can limit your DNS lookup overhead even further by restricting lookup to only the files you need hostname lookups on, such as HTML or CGI. To do that, add something like this in your configuration files:

HostnameLookups off
<Files ~ "\.(html|cgi)$>
HostnameLookups on
</Files>

Check the timeout

Even relatively simple Web pages can have a number of pieces. Previously, a browser had to set up a new connection to the Web server to retrieve each piece--a connection to retrieve the HTML and separate connections for each GIF. One page with three images would require four connections. That's kind of expensive in network traffic, and can really slow things down.

HTTP 1.1 added a new feature called keep-alive. This lets a Web server keep a connection open so the browser can send down multiple requests without having to set up a new connection for each one. In Apache, keep-alives are controlled by three directives in httpd.conf: KeepAlive, MaxKeepAliveRequests, and KeepAliveTimeout.

The KeepAlive directive determines whether to activate the KeepAlive feature, while MaxKeepAliveRequests determines how many requests the server will allow from a browser during a single connection. And KeepAliveTimeout determines how long the server will keep the connection open waiting for additional requests. So to turn on keep-alives, and allow for 100 requests with a 15 second timeout, add the following lines to httpd.conf:

KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15

Your server may already be configured this way, so double-check httpd.conf before adding these lines.

If you're a little paranoid, you may wonder whether changing these values actually does anything. HTTP is a text protocol, so you can test the KeepAliveTimeout value for yourself. First, you need to Telnet to your server at the appropriate port. In our case, our Linux computer was mapped in our internal DNS tables as builder-linux, and the Web server was running at the default port 80. So the Telnet command we used was telnet builder-linux 80.

Now type in the following, remembering to hit Enter twice after typing the second line:

HEAD / HTTP/1.1
Host: builder-linux

This retrieves the header information for the main index file of your server. Now simply wait for the timeout--in our case the connection is terminated 15 seconds after we send the HTTP query.

If you change the KeepAliveTimeout, you can use this Telnet trick to verify that the change took place.

Rotate log files

Whenever a browser downloads information from an Apache Web site, the server stores information about that access in a log file. You use these log files in conjunction with other scripts or software to analyze the traffic on your site. If you've got a busy Web site, that log file can grow quite large rather quickly and become too unwieldy for easy analysis. The answer is log rotation.

Rotating your log files means periodically creating a new log file, so that the older one can be archived or sent to an analysis tool without disturbing the current log file. Apache makes it easy with a built-in log file rotation utility called rotatelogs.

To use it, edit your httpd.conf file to include the TransferLog directive:

TransferLog "| /usr/local/apache/sbin/rotatelogs /usr/local/apache/var/log/access_log 86400"

The example above gives TransferLog the location of the rotatelogs program, as well as the location and name for the log file. The number at the end indicates how many seconds between each log rotation -- in this case 86,400 seconds, or one day.

Apache generates a log file with a base name of access_log followed by a long numeric extension. So you might have one called access_log.0904347600 and then a day later you'd have another one that's got an extension value that's higher by 86,400.

Block bad robots

Robots are programs that automatically download pages from your Web site. A well-behaved robot is supposed to read your robots.txt file to determine how to crawl your site. But ill-behaved robots may ignore the file, potentially distorting your Web site traffic and ad reports as well as stealing your network bandwidth and slowing down your Web server.

If you know the robot's IP address, you can use the Apache Deny directive to restrict Web access from that IP address. For something more powerful, use the Apache mod_rewrite module. It may not be part of your default Apache configuration--you can check using the ./httpd -l command. If it's not there, you'll have to edit the Configuration.tmpl file and recompile Apache.

Once you've installed mod_rewrite, you can use it to restrict access to your server based on any server or environment variable, including IP address, robot agent name, and time of day.

For example, adding the following directive to one of your configuration files blocks all access from any robot with the keyword "NameOfBadRobot" in the HTTP user agent:

RewriteCond %{HTTP_USER_AGENT}   ^NameOfBadRobot.*
RewriteRule ^/.*   -   [F]

For more on using the mod_rewrite module, read the documentation at the Apache Web site.

Diagnose your server

The mod_status and mod_info modules let you analyze and debug your Web server from a browser. First, make sure the modules are compiled in your version of Apache. Then activate the modules and control access to this information.

The mod_status module gives you comprehensive Web server diagnostics such as uptime and downtime, requests, CPU usage, and so forth. (Note: using this module requires that Apache be running in standalone mode, not as an inetd server.)

Add the following to your access.conf file:

<Location /status>
SetHandler server-status
<Limit GET>
order deny,allow
deny from all
allow from .cnet.com
</Limit>
</Location>

This configures the server-status handler to run on the /status virtual directory. You'll notice the use of the <Limit> directive, which is another directive for restricting access to your site. In this case, it actually lets you limit not just who can access the section, but what sort of requests are honored. The example is configured to accept only GET commands, and only from computers within the cnet.com domain.

You can add a bit of extra interactivity by appending a refresh command to the URL:

http://oscar.cnet.com/status?refresh=5

This gives you the Apache Web server status for oscar.cnet.com every 5 seconds.

The next module is mod-info. Again, edit your access.conf with the following section:

<Location /info>
SetHandler server-info
<Limit GET>
order deny,allow
allow from .cnet.com
deny from all
</Limit>
</Location>

Now, a URL such as http://oscar.cnet.com/info will give detailed information about the oscar.cnet.com server, such as running daemons, version, users and groups, hosts, ports, and so forth. But mod_info also supplies a list of all the modules that Apache is using, as well as stats about each module (such as the directives being enforced). This can be very helpful when trying to debug the server.

Run on a Windows notebook

If you need to run Apache on a Windows 95 or Windows 98 notebook computer, you will probably have to change the server name. When a notebook PC is running without a LAN card or active PPP/SLIP dialup connection, Windows will not load its TCP/IP support, and Apache will return an error message because it can't determine its network name.

Why would you even want to run a Web server on a disconnected laptop? Well, you may want to prototype and test things while on the road. Thankfully, it's a simple problem to fix.

In the httpd.conf file, you'll see the ServerName directive commented out. Change it to match the Windows name you've assigned to the laptop. In our case, the laptop was called Raptor in Windows networking, so the line in our httpd.conf file read:

ServerName raptor

Now Apache will run fine, whether you're networked or not.

1 comments
Odeomi13
Odeomi13

I have been looking for this for a while

Editor's Picks