Open Source

SolutionBase: Use mod_rewrite to handle URLs on Apache servers

You can increase the flexibility of your Web servers by manipulating URLs. The Apache mod_rewrite module can help you do more with URLs. Here's how it works.

On your Apache Web server, the ability to manipulate URLs can enable important security features and open up a range of Web site enhancements. Your tool in this pursuit is the Apache module mod_rewrite, a powerful mechanism for URL handling. Mod_rewrite gives you the resources to manipulate URLs dynamically in client-server interactions for security and data analysis purposes.

Why would I want to use mod_rewrite?

Why would you want or need to manipulate URLs? There are a number of reasons, some of which may seem obscure. But with some forethought and imagination, you can use this capability to enhance not only your Web site security but also the performance of your Web pages and the on-site experience of your users.

Here's what URL manipulation can do:

  • Session ID, embedded in a URL, can enable session tracking.
  • When the server logs the URLs of all inbound client requests, a session history (tagged with session ID) can be compiled.
  • When the tracked sessions are analyzed, site users can be profiled and site usage can be evaluated for efficiency and effectiveness.
  • Tracked URLs can be used to identify clients in future sessions, as a security measure.

To elaborate a bit, you can use URLs with embedded session IDs to track state information that records the interaction between the server and individual clients in the course of a Web session. This is an alternative to other methods using cookies, etc., and has many attractive features, such as portability across multiple sites.

You can use session ID to associate future requests with specific users, an important security feature. Logging session-tagged URLs, which accumulate by client request, allows you to reconstruct: 1) who came to the site; 2) where they entered the site; and 3) everything they clicked while on the site. This is valuable information.

On the server side, it tells you which features of your Web pages are attracting attention. On the user side, it tells you the preferences and usage habits of individual users. Be awareï¿?there's a privacy issue associated with this practice. Consider this carefully before implementing usage analysis by client.

Putting mod_rewrite to work

The tools built into the mod_rewrite module are formidable. The module empowers you to generate modified URLs, point them to specific site entry points on your server, select site/page content dynamically, generate and implement rules for URL rewriting, and balance your request load. You can create a Web cluster, set up a reverse proxy, and select content by browser, all with URL manipulation.

When mod_rewrite is active, it begins processing when Apache notes an inbound server request. The initial processing will be according to its module configuration parameters (and does not yet involve content). The rewriting engine then rewrites the URL of the request according to the established ruleset. The URL will be rewritten as a revised URL or as a filename, depending on the ruleset and conditions. While this seems fairly straightforward, the options available to you are considerable, and it's fair to warn you that this can be a pretty complicated (but worthwhile) undertaking.

Configuring mod_rewrite

To configure mod_rewrite, first obtain the source for the module from the Apache source tree. For 2.0.x users, this is:

/apache-src-path/httpd-2.0.43/modules/mapper

On earlier versions, you'll find the path at:

/apache_1.3.27/src/modules

Next, you need to enable the module in Apache. As always, do this by uncommenting the appropriate Load command in httpd.conf:

LoadModuleï¿?ï¿? rewrite_moduleï¿?ï¿? modules/mod_rewrite.so

If you're using Apache 1.3.x, the modules you need are:

LoadModuleï¿?ï¿? rewrite_moduleï¿?ï¿? libexec/mod_rewrite.soAddModuleï¿?ï¿? mod_rewrite.c

You'll have access to the module when Apache restarts. You can then make use of the module's configuration directives. Begin by using the RewriteEngine on directive to enable the rewrite engine. No rewrite rules will be processed if the engine isn't set to on.

You have the option of setting up "local" rewrite rules, or rules-by-directory. This is done in .htaccess files, if you have a base URL established. Assuming that URLs are related to physical file paths on your server, you can use this command:

RewriteBaseï¿?ï¿? {URL path}

You can log rewrite engine activity if you want. Be aware that logging activity can severely impact server performance. You should use it mostly for debugging purposes only. Its location is set with the directive below. The log filename must begin with a slash or it will be assumed to be under the server root.

RewriteLog {file path}

You can also set the level of log detail. If you set the level to zero, you'll have no logging at all. Setting this directive to 9 gives you everything there is.

RewriteLogLevel {level}

Now comes the directive that forms the teeth and claws of mod_rewrite. RewriteRule is where you'll set your rule definitions for URL modification. You use this directive as many times as you need, assuming one directive statement defines one rule. Pattern below refers to an expression, in Perl format, that's applied to the URL. Substitution defines what will be placed in the URL.

RewriteRule pattern substitution

Conditions for rule implementation may also be defined. It's important that you place this directive before the RewriteRule directive of the rule upon which it will act. TestString (below) defines a virtual variable to be tested for a condition match. CondPattern is a pattern specified as a Perl expression applied against TestString, a match indicating that rule application is appropriate. This is a complex and very detailed directive; refer to Apache's mod_rewrite documentation for a complete summary of valid expressions and operators.

RewriteCond {TestString}{CondPattern}

You can create a map that the engine will use to modify fields via a mapping function. The RewriteMap directive that defines this uses string search and contains the name of the mapping function, the file type (i.e., .txt for text file, .int for internal function, etc.), and the source for the function. The directive looks like this:

RewriteMap {MapName}{MapType:MapSource}

RewriteMap uses a lock file, and its location is:

RewriteLock {file path}

It's possible for a per-directory rewrite rule to set up an endless loop of redirects. To prevent this loop from truly being endless, you can use the Rewrite directive shown below to set redirects to a maximum value such as MaxRedirects=10. You can also use this directive to set the inherit option, which will force an instantiation of mod_rewrite, in a virtual server scenario, to adopt the configuration of the parent server.

RewriteOptionsï¿?ï¿?ï¿? {options}

URL handling procedures

Once you have mod_rewrite in place and configured, you can put it to use enabling specific security and tracking mechanisms. We'll discuss how to do this in an upcoming article.

About Scott Robinson

Scott Robinson is a 20-year IT veteran with extensive experience in business intelligence and systems integration. An enterprise architect with a background in social psychology, he frequently consults and lectures on analytics, business intelligence...

Editor's Picks

Free Newsletters, In your Inbox