Open Source

URL rewriting with Apache's mod_rewrite

Justin James explains how to rewrite URLs with Apache mod_rewrite, illustrating the basic syntax as well as Flag options that control how the rewriting is handled.

One of the most useful features that Apache has is the URL rewriting system known as mod_rewrite. With mod_rewrite, Apache administrators can do all sorts of neat tricks like "friendly URLS" or reverse proxy serving. The mod_rewrite system is pretty in-depth (as you can see from the extensive documentation), so today we're just going to look at some of the more common, simple uses.

Before you start, you will want to make sure that you have mod_rewrite installed and loaded as an Apache module, in the way that you typically do it on your system. Unlike other modules that start working once they are loaded, mod_rewrite has a simple directive to switch it on and off, which is useful for testing. You need to have the line RewriteEngine On in your configuration for mod_rewrite to do anything.

The RewriteRule directive is where most of our rules are created. The most basic rewriting rule is to have one URL served by a different one. The matching is done with Apache's regular expression engine (which uses PCRE regular expressions). The basic syntax is:

RewriteRule OriginalPath NewPath  [Flags]

The flags are optional, and control how the rewriting is performed. No flags means that the specified URL (which can be a local file path, absolute URL, or a relative URL for the same server) will be retrieved and passed through, and processing of the rules will stop. If you want to not change the URL at all, and just have the flags operate, then use a dash for the replacement URL. Flags are contained within square braces, and you can use multiple flags by separating them with a comma. Here is a basic example:

RewriteRule ^/(.*)$ /cms/get-page.cgi?q=$1 [H=cgi-script]

This would take the contents of the path after the forward slash, and pass them as the parameter "q" to the script "get-page.php" in the "cms" path. The "[H=cgi-script]" flag is needed in this case, to have Apache use the "cgi-script" handler to serve the request, otherwise it will treat it as the original request which may very well dump the script's source code to the client!

One of the more useful flags is the R flag, which means "redirect." Instead of telling Apache to perform the work internally as if a different URL was called, it will instead redirect the user to the new URL with the status code 302, and optionally, you can specify the HTTP status code. For example:

RewriteRule ^/otherserver(.*) http://www.otherserver.com$1 [R]

This will take the portion of the path after "/otherserver" in an URL, append it to "http://www.otherserver.com" and perform a status 302 redirect to that URL. This is exactly what to do if you want to force users to a particular server. If you substitute the "P" flag for "R" then it works as a proxy, where the Apache server will perform the request and retrieve the response, and pass the response through to the client.

Some of the other more useful flags (fullname followed by a shortname) are:

  • chain, C: "chains" requests so that if the current rule fails, the next one in the list will not be executed.
  • forbidden, F: responds with an HTTP status 403 ("FORBIDDEN") to the client, useful for blocking requests.
  • nocase, NC: makes the matching for this rule case insensitive.
  • proxy, P: has the server act as a proxy server.
  • passthrough, PT: The newly written URL is put back through the rewrite system to be processed.
  • qsappend, QSA: Adds the query string from the original request to the new request.
  • redirect, R (can add: =code): redirects the response; optionally you can specify the HTTP status (defaults to 302). For example: [R=301] will use an HTTP status 301.

Another interesting use of mod_rewrite is to use "map files" to handle the mapping. The advantage of a map file is that you can use other applications (including something like a content management system) to edit the rewrite map file, instead of having them manipulate critical Apache configuration files which they shouldn't have access to. Map files provide variables that can be used in the substitution URL by name.

These files can be static files (type "txt"), random files (type "rnd"), DBM files (type "dbm"), internal Apache functions (type "int"), or programs that generate the appropriate values (type "prg"). In the random files, you can use the pipe character ("|") to separate values; when you do this, Apache will randomly choose one of the values. Static and random map files can contain comments (anything after a hash character), blank lines, or name/value pairs separated by a space. DBM files can be generated from static map files, and have the advantage of being faster with their lookups, which is important for large maps. The internal function lookups must be registered by Apache modules. Programs are any program you want, that reads STDIN to determine what key is being looked up (it is passed in as a NULL terminated string), and writes the value to STDOUT.

An example of a static file:

# Sample map file, stored at: /www/config/rewritemap.txt

path1 hello

path2 goodbye

To use the map file, you first make it available with the RewriteMap directive. Then, you can refer to values in it within substitution URLs in the format: ${mapfilealias:variablename}

Here is an example, using the map file shown above:

RewriteEngine On

RewriteMap samplemap txt:/www/config/rewritemap.txt

RewriteRule ^/path1 /${samplemap:path1}/world.html

RewriteRule ^/path2 /${samplemap:path2}/world.html

This would turn the path "/path1/anything/else.html" into "/hello/world.html", and "/path2/goodtoseeyou.html" into "/goodbye/world.html". All other paths would remain unchanged.

There are a number of other tricks that you can do with Apache's mod_rewrite. One of the more in-depth uses is the RewriteCond conditional directive, which can use information about the environment or request to decide which set of rules to use. While mod_rewrite is often too much work for simple scenarios where Redirect or Alias directives would work fine, if you have complex needs, it's a good thing to know about.

J.Ja

About

Justin James is the Lead Architect for Conigent.

0 comments

Editor's Picks