Networking

The architecture of a flexible .NET file processing system -- Part 3


This blog post is also available in PDF form as a TechRepublic download.

In Part 1 of this series, Zach Smith described the overall architecture of a dynamic and scalable file import system. In Part 2, he got into the details of designing a highly flexible and scalable file processing system using .NET framework technologies and showed you how the file listener components are implemented. In this blog entry (Part 3), he continues to the next portion of the system and explains how files are routed through the system via the router component.

As files are dropped into the system, they are picked up by the file listener components, and messages representing those files are sent to the processing queue. Once the messages are in the processing queue, the system must determine which task-specific processing queue the message should be sent to. (For more information on this, see Part 1 of this series.) This functionality is implemented in the router component.

The basic task of the router component is to pick messages up from the processing queue, run logic on the message properties (for more information of those properties, see Part 2 of this series), and send the message on to the correct task-specific processing queue. This means that the router component is basically the brain of our processing system and determines where everything is sent. Other processes may do things such as zip a file, but those processes are highly specialized and, to a point, inflexible. The router component, on the other hand, dynamically routes any type of message anywhere in the system.

Router requirements

While I wouldn't call the router a highly complex piece of software, its task is complex, and the design of this particular component is critical to the performance of the file processing system as a whole. The complexity of the router component comes to light when we look at its functional requirements:

  • Must be able to examine any field of a file message and route the message based on that data
  • Must be highly dynamic and easily modified
  • Must be lightweight and able to handle a very high volume of files (every file imported will pass through the router!)
  • The logic used to route messages must be easily updated

Given those requirements, you can easily see that the router component isn't a piece of the system that should be taken lightly. As mentioned earlier, the router is the brain of our system.

The following is a list of the actions performed on a message as it is picked up and routed by the router:

  • Message is picked up from processing queue by the router
  • Router component loads message into memory
  • Router component examines message and applies routing logic to determine the message's destination
  • Router component sends the message to the destination
  • Router component returns to the beginning and looks for other messages to route

The routing logic

The routing logic for the router should be stored in either a database or some type of file that is easily accessed and modified. This allows the router to be updated to meet changing business requirements. It is also important that this logic be stored in an easy-to-understand format. For example, it might not be the best idea to store your routing logic as bits of dynamically compiled C# code. The reason for this is that it will make the logic harder to modify for employees who are not familiar with the router.

Instead, try to come up with a simple format for the routing logic. An example of this is shown below, in XML:

<RoutingLogic>
   <Route Name="DirectImport" Destination="some-serverimportqueue">
      <Keys>
         <Field="FileName" Value="^[0-1]*.txt$" />
         <Field="Source" Value="FTP" />
      </Keys>
   </Route>
   <Route Name="Unzip" Destination="some-serverunzipqueue">
      <Keys>
         <Field="FileName" Value="^[a-b0-1]*.zip$" />
      </Keys>
   </Route>
</RoutingLogic>

In this example, two routes are configured. The first route is called "DirectImport" and will take any message with all numbers followed by the extension .txt for the filename, or any file that is sent via FTP and send it to a message queue called "ImportQueue" on the server "some-server." The second route will take any message with letters and numbers followed by .zip in the FileName property and send the message to the "UnzipQueue" on some-server. You could probably figure out what is going on by simply looking at the XML, which is why I suggest taking this approach to storing your logic. It's just easier to read than line upon line of C# or VB.NET code.

The XML layout shown above could easily be converted to a set of SQL tables. You also don't have to use regular expressions as your field matches, but I recommend that you do because they give you much more flexibility for your routing logic.

Another interesting tidbit to note about the XML logic shown above is that it allows you to route messages of different types to queues on a different server. This is important for the scalability of the processing system, and I highly recommend that you include this functionality -- there is really no reason not to, even if you currently have all task-specific processing queues on the same server.

Loading the logic

The way your router component consumes the logic information is up to you. However, a solution that has worked in the past is to load the logic into the router as the router first starts up. This way, the logic is loaded into the process only one time and processing time isn't wasted by reloading the information for each request. However, this can cause some issues with maintenance of the logic if you're not careful.

If you take the approach above, I recommend that you also program your router component with a hook so that you can force it to reload the logic file by sending it some sort of control message. If you don't do this, you will find yourself restarting the router component after each change to the logic file. When I say "control message," this could simply be a specially formatted message sent to the processing queue -- in fact, that would be the easiest way to handle the issue.

Developing the router

With this document, you should have a good idea of what is involved in developing the router. If you plan on having another developer work on this portion of the file processing system, make sure he or she is proficient in the following skills:

  • Message queuing
  • XML (or ADO.NET if you're using a database to store the logic)
  • File IO
  • The overall architecture of the file processing system

In my opinion, it is a good idea to give this project to the lead developer, who will be more likely to have the skills necessary and will have a good idea about how the router fits into the architecture.

Final thoughts

As mentioned earlier, make sure the router is lightweight. It has only one job -- to route messages. It shouldn't be tasked with anything else except for maybe logging statistics about how many messages have been routed. If you put any extra logic into the router (such as having it move a file), you will slow the whole processing system down.

In the next part of this series, I will discuss the task-specific processes and how they fit into the architecture. This is where we will get into the logic of how we will move files through the system.

6 comments
rwoodruf
rwoodruf

Zach, I am following this series to write a reference implementation that I intend on making open source. I wanted to know when you will write the next installment? Thanks

jeff
jeff

Where is the source code?

bluemoonsailor
bluemoonsailor

You need to clearly specify whether the router stops at the first destination match or processes all destination rules that match. This can have a significant impact on how you design the logic and rules. If the router only seeks to the first destination match, then for performance reasons you may want to allow the user to specify the order of rules processing so that "easy" tests are done first and complex tests are done later. You'll also need to specify that an earlier rule will potentially "mask" a later rule. On the other hand, if you decide to process all rules in case of multiple matches, you'll need to clearly specify that one incoming file may end up at multiple destinations. While there is no one right answer - either processing method is perfectly valid - there is definitely one wrong answer: not clearly stating how the rules will be processed. Steve G.

zs_box
zs_box

The next installment should be out soon! I have the text, I just need to submit it to the editor.

zs_box
zs_box

Source code is not included with this series. If you need help/tips on how to do a certain component I'd be more than happy to answer via email. Thanks! Zach Smith

zs_box
zs_box

This is a very important point that I unfortunately overlooked. In my system the router stops at first match - since these are files we are working with we don't want more than one process touching the file at any one time. Also, order is important as you pointed out. In my implementation the router logic is in XML and the order is determined by the order the entries are put into the XML. Thanks for the great and informative comment on the article!!