Apache comes with an amazing array of tools, modules, and add-ins. One of the most useful, yet most often misunderstood, is the mod_rewrite module. Administrators can produce much friendlier URLs by having this module rewrite URLs on the fly, without user intervention. So instead of users having to type the URL http://www.yoursite.com/index.php?id=712, they only have to enter http://www.yoursite.com/712.html. That's just one example of how mod_rewrite works. The mod_rewrite module can rewrite and redirect URLs in a seemingly limitless number of ways.
The above example not only benefits users, but it can also make your site more attractive for search engine crawlers.
I'm going to show you how to take advantage of the mod_rewrite module and apply it to your Apache-driven Web site so that your URLs are easier to remember and easier to find.
As with nearly all Apache configuration, mod_rewrite is configured within the httpd.conf file. The mod_rewrite module can be configured either globally or on a per-directory basis. For the purposes of this article, I'll configure it on a per-directory basis. To do this, I'll include the mod_rewrite configurations within the <Directory></Directory> tags so that you can better control how mod_rewrite is applied to a site.
There are a number of useful directives for mod_rewrite that you need to know:
- RewriteEngine: This directive enables or disables the runtime rewrite engine.
- RewriteOptions: This directive sets either the inherit or the MaxRedirects options for the current per-server or per-directory configuration.
- RewriteLog: This directive sets the name of the log file that will be used for redirect system messages. One caveat: Do not redirect the log file to /dev/null to effectively disable the log file, because this will slow down your server. If you don't want to use a log file, set the RewriteLogLevel to 0 instead.
- RewriteLogLevel: This directive sets the verbosity level of the rewrite logs (0 = no logging, 9 = debug-level logging). The higher the log level, the more your server will be slowed by the logging process.
- RewriteLock: This directive sets the file name for a synchronization lockfile, which mod_rewrite needs to communicate with RewriteMap programs.
- RewriteMap: This directive defines a map, which can be used inside rule substitution strings by the mapping functions to insert/substitute fields through a key lookup.
- RewriteBase: This directive explicitly sets the base URL for per-directory rewrites.
- RewriteCond: This directive defines a rule condition.
- RewriteRule: This directive serves as the actual rule that will define how the URL is rewritten.
At the very minimum, inside of httpd.conf, rewrite will need a Directory listing such as the following (I'll configure /var/www/html as the directory to apply my rewrites rules):
The above directive will tell Apache that the rewrite engine should be applied to /var/www/html, but will do no more. From this directive, nothing will happen because I've not defined anything outside of switching the engine on. A more complete rewrite directive might look like this:
RewriteRule ^foo$ foo/ [R]
Let me tell you exactly what the above does, line by line.
- <Directive /var/www/html> is the beginning of the directive. Note that there is no trailing / on the directory. This must be left out or processing will be wrong.
- RewriteEngine ON turns the rewrite engine on.
- RewriteLock /var/lock/subsys/rewrite_lock tells the rewrite engine where the lock file is to be.
- RewriteLog /var/log/httpd_rewrite tells the rewrite engine that you are using logging and exactly where to place the log file.
- RewriteLevel 1 tells the rewrite engine that you are using a minimal log level so very little information will be stored.
- RewriteBase /~stuff/ tells the rewrite engine what the base directory level will be for rewriting.
- RewriteRule ^foo$ foo/ [R] is the actual rewrite rule.
- </Directive> closes the directive.
The rewrite rule solves a common problem for Web administrators. Many times, the trailing / becomes a problem because if a user browses to a specific directory on your site (say http://www.foobar.foo/stuff) without a trailing "/," Apache will be looking for a file named "stuff" instead of the index file in the directory "stuff." So the RewriteRule above makes sure the trailing / is included in all requests. But, the way I have this set up, this rule will only apply to the directory "stuff."
Following the rules
Working with mod_rewrite depends on a solid understanding of regular expressions (which is a bit beyond the scope of this article). Because the rules depend on regular expressions, they can be a bit complex. Fortunately, once you get the hang of it, they are pretty self-explanatory.
The syntax for a typical rewrite rule is RewriteRule <pattern><action> [flags], where pattern is the regular expression matching the URL to be rewritten, action describes how the URL is to be rewritten, and flags are optional flags that can be attached to the rule.
Below I have some more examples of how mod_rewrite can be effectively used. Once you have a solid understanding of that, and a solid understanding of regular expressions, you can begin writing your own rules.
Making shortcut URLs
You should know the difference between canonical URLs and shortcut URLs. A shortcut URL for a typical user might look like http://oursite.com/~chubbchubb, while the canonical URL might actually look like http://oursite.com/u/chubbchubb/. Note that the trailing / is omitted from the first URL. Of course, you don't necessarily want your internal users to have to enter in these sometimes lengthy URLs (complaints would certainly ensue). To solve this problem, use these two RewriteRules:
RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2 [R]
RewriteRule ^/([uge])/([^/]+)$ /$1/$2/ [R]
The first rule will redirect the shortcut URL to the canonical URL, and the second rule will append the missing / character to the end.
Where is that document root?
Typically, the document root of a Web server is that location directly related to /. Many times, however, a company will want to have document roots for various uses. Let's say the homepage for the internal Web site is housed in /e/iww/ and the homepage for the external Web site is housed in /e/www/. However, the document root is defined as /e/www/, and both internal and external Web sites use common images and files (which would logically be housed in the document root). To solve this problem, you just redirect all requests to the /e/www/ directory with this rule:
RewriteRule ^/$ /e/www/ [R]
Note that the [R] in both examples above (and other examples) indicates that this action is to be performed recursively within the base directory.
Moving home directories for future replacement
Let's say you're going to replace one Web server with another. Both Web servers are up and running, and you want to start moving traffic from the old one to the new one. This solution is simple: You use a rewrite rule to redirect all old-server requests to the new server (where "new_server_address" is the actual address of the new server):
RewriteRule ^/~(.+) http://new_server_address/~$1 [R,L]
The mod_rewrite module also will accept conditions in requests with the RewriteCond directive. This directive checks to see if an internal URL request matches a specific condition before it applies the request to the rewriting rule. Say, for example, you want to allow the server to search for pages in multiple directories (use the directories /pub/, /serv/, and /ext/). For this action, you would get RewriteCond involved as shown in this sidebar (assuming that the three directories lie within the docroot of the Web server).
Rewriting the rules
The mod_rewrite module goes well beyond what I have covered here. I touched on some of the very basics of what rewrite rules can do. However, I didn't even touch on RewriteMap or using rewrite rules to force external processing. But before moving on to advanced topics, it is best to get a solid understanding of regular expressions. To learn more about regular expressions, I recommend Mastering Regular Expressions by O'Reilly Publishing.
Jack Wallen is an award-winning writer for TechRepublic and Linux.com. He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website jackwallen.com.