Data Centers

Perl data routing lets you customize transaction processing

Today's Web applications typically require multiple Web servers, but that can create development problems when you're working with submitted form data. Here's an approach you can use to circumvent such headaches.


Often, a simple client-server model isn’t enough for your Web application, so you must know how to handle cases of multiple Web servers. The simple solution illustrated in this article implements data-based routing. The destination of a submitted form depends on its content, rather than on the URL to which it is submitted.

When Web infrastructure is no longer basic
Any large organization is likely to have a number of preexisting applications. How can these be tied together to make a cohesive Web front end?

There must be a Web server between all needed applications and the Web client, but this instantly creates many security concerns. First, you might not want these systems exposed to the Web public. Second, duplicating error handling across a number of such applications requires a lot of work. Third, when multiple Web servers are involved, the viewable HTML Web pages are the simplest integration point. These are exposed directly to the user and are hardly a good place to information-hide the system under development. Finally, such a system is not data-dependent: Form submissions can go only to their one, predestined Web server service.

Data-dependent submission can be added in a couple of simple ways. The easiest is to use a bit of JavaScript on the client side to inspect the form data. When the Submit button is clicked, the JavaScript script alters the destination URL. This trick, however, has no security. A second choice is to use a redirect system. With this approach, the form is submitted to a Web server, where a server-side program picks through the data. Instead of processing that data, an HTTP redirect is sent back. This reply tells the Web browser to resend the form data to the “real” Web server that will do the data processing. This approach moves the integration point from the HTML page to the HTTP protocol, but this is still on the user side, so parts of the system remain exposed.

A better solution is to have a single Web server acting as a gateway. All other servers are hidden from public view. Requests go to this one server, which examines each request and sends it directly to the right server. The integration point has moved to private processing behind the gateway server. There, it’s secure and it becomes trivial, for example, to map a submitted URL to a physical URL on another host. This system also allows you to customize the Web experience dynamically, depending on what the user types. That principle is illustrated here using a simple Perl CGI program.

Collecting Perl modules
This system is all about understanding HTTP and its request-response messaging style. See RFC 2616 for the gory details, available on the www.w3.org Web site. To make and manipulate such messages, you’ll need HTTP power tools.

At the lowest level is the LWP Perl module. LWP stands for lib-www-(in)-Perl. Lib-www is the standard interface to HTTP, a library written in the C language. LWP is a module that matches lib-www but that's written in Perl. The LWP::UserAgent submodule is the specific piece of LWP you’ll need. It handles the HTTP client-server interaction between two computers.

LWP is a tad tedious on its own, so a number of other modules speed up scripting. Specifically, HTTP::Request and HTTP::Headers provide objects useful for creating HTTP requests. You can get away without needing HTTP::Response; any such objects are supplied by LWP.

Finally, the old warhorse CGI is a module you’ll need so that you can grab the data submitted by the user in the first place. Although the user’s form submission is an HTTP request message, by the time it passes through CGI, all that’s left are a few environment variables and the form data. You’ll have to fix that. Listing A demonstrates the required steps.

Stringing the logic together
The first step is to expose the submitted form data to Perl. This is a routine use of the CGI module: Create a query object and look at it.

In this example, we assume that the form submission includes a parameter, service_class. This parameter determines what system the submission goes to. Choices are a Premium, Standard, or Basic Web server application. Perhaps the organization is sensitive about customer loyalty at different price points and wants to present Web pages differently for various customer groups. Airlines might be an example, or wine customers. Although the data routing in this example is trivial (convert a service class number into one of three destination URLs), it can be as sophisticated as you like, analyzing all form fields and/or cookies, possibly referring to ZIP code or postcode lookup tables, or even using an external database.

Now for the techie part—constructing a new HTTP request. This CGI program is also a Web client and will issue a request to a server as though it were a browser. Creating the new request requires a set of HTTP headers, which is one HTTP::Headers object. The information passed across CGI to Perl is collected in this new request. There are two common request types: GET and POST. (HEAD is not supported.) The new request type can be different from the old, although they're the same in our example.

Next, send the request to the chosen real form destination. You create an LWP::UserAgent object to open the low-level networking connection over which the request is sent, and then send the request headers (plus any POST content) in a single line and wait for a response. The original request from the real user (and this CGI script) will hang suspended until the new request is complete. That might seem like an unnecessary delay, but this CGI program has done nothing so far except process a few headers, and the original request would have hung suspended at any hard-coded server anyway. So a trivial time overhead is incurred.

Finally, you get the response back. Unless there’s been a disaster between this Perl script and the ultimate form destination, the data that comes back with the response will just be HTML content to be displayed in the user’s browser. Since this script is a CGI program and responsible for providing browser content, you simply repeat out the content sent back from the second Web server.

Look before you leap
To use this example, you'll need to modify only step [2] in the listing. But for several reasons, we recommend that you thoroughly test before going to production.

Note that the example is only a demo. It needs better error handling and integration with your own standards. Second, to be frank, we tested it more with POST requests than with GET requests. Most important, there are variations in the CGI interface. There’s a CGI standard, and Apache is pretty widespread (we used it), but your system could be sensitive to which HTTP headers are needed. This example doesn’t use Expires headers, for instance, so your mileage might vary. Also, we used Perl 5.6, which may not suit you.

Download the code used in this article
routedata.cgi

Put it to good use
Using data-based routing, you can customize transaction processing and user experience based on user’s data-oriented behavior or a customer-tracking cookie. Not only does this provide a central point for Web transaction logging and performance analysis, but it also keeps your more private services away from the user’s unwanted experimentation.

Editor's Picks