Document management solutions are big
business. As a matter of fact, try finding a system that provides
you with all the necessary functionality, without having so many
features that it breaks your budget. To get all the features you
need, you can go with IBM, Microsoft, Documentum, etc., or you can
create your own document management solution. I’ll explain how you
can create the foundation for a document workflow/archiving
solution on a budget.

Documents are normally computer files or
digitized paper documents stored as computer files. Storing files
is simple—the operating system provides the mechanism for storing
files in a logical folder/file structure. This is probably the
least expensive document management/archiving solution. However,
most documents have a purpose, which is where document workflow
comes into play.

Sending and receiving files across the Internet
(or intranet, if applicable) is handled, in a broad sense, by
encoding the file in base64 and shooting the encoded data down the
wire to its destination. The browser handles this through the use
of the file <INPUT> HTML tag. This data is sent to the Web
server as multipart POST data, and the request handler on the Web
server must extract this file data in order to use it. You can use
HTML form elements to provide the associated data (index data);
then, it’s the job of the script or executable running on the Web
server to parse the data and store the information. This entails
extracting the file and the index values, storing the file in some
form of archive such as a folder on the Web server or a remote
server, and adding index values and file ID information in a
database.

The fun part is when the file is received—it
must go into some form of document workflow, which is a logical set
of steps that must occur before the workflow reaches
reconciliation. What will make your workflow solution extensible is
its ability to provide a template for custom workflows. Workflows
can be defined using XML. A workflow manager can use the XML
template to direct the workflow from one queue to another, all
along the way performing actions and gathering data that will move
the document from receipt to reconciliation.

I’ll use accounts payable/receivable as an
example of this type of scenario. Company X receives an invoice
from Company Y. The invoice and related pages are scanned and sent
to archive. Information such as the vendor ID number, the invoice
amount, and the date are stored as index values with the document.
Once it’s sent to archive, a document workflow process begins. This
might be as simple as directing the document to a queue where an
employee responsible for writing the check to the vendor will see
the document. The employee opens the document, reviews it, writes a
check, records the check number and date, etc., and sends the
document on to reconciliation. Reconciliation is the endpoint for
the workflow, so no other operations can occur on this
document.

Create a starter page

The starter page is a simple page that allows
one file to be added to the workflow along with index data, which
you’ll utilize later to search for documents. Here’s the starter
page’s HTML:

<html>
<head>
</head>
<body>
<form method=”POST” enctype=”multipart/form-data”
action=”addDocument.asp”>
<input type=”text” name=”customer_id”
maxlength=”10″><br>
<input type=”text” name=”order_id”
maxlength=”12″><br>
<input type=”text” name=”invoice_no” maxlength=”20″>
<input type=”hidden” name=”ts”
value=”2004-09-28T00:00:00″>
<input type=”file” name=”filename”><br>
</form>
</body>
</html>

The workflow XML defines the document workflow
from reception to reconciliation. The XML also defines the document
properties associated with the workflow, including the index data.
Here’s an example of a workflow XML:

<workflow type=”INVOICE PAYMENT”>
    <document type=”invoice”>
        <index>

            <datum
name=”customer_id” datatype=”varchar” length=”10″/>
            <datum
name=”order_id” datatype=”varchar” length=”12″/>
        </index>

        <data/>

    </document>
    <queue name=”receiving”>
        <roles>

           
<role>receiver</role>
        </roles>

        <data>
            <datum
name=”invoice_no” datatype=”varchar” length=”20″/>
            <datum
name=”ts” datatype=”datetime”/>
        </data>

    </queue>
    <queue name=”payment”>
        <roles>

           
<role>payer</role>
        </roles>

        <data>
            <datum
name=”check_no” datatype=”int”/>
            <datum
name=”amount” datatype=”decimal” length=”20″
precision=”2″/>
            <datum
name=”ts” datatype=”datetime”/>
        </data>

    </queue>
</workflow>

The workflow XML provides the necessary
information for creating a workflow. The queue manager uses the
workflow XML to manage the workflow from the start queue to
reconciliation. This workflow is only meant to contain one document
from beginning to end. Look at the first sample HTML, and you’ll
see that the page reflects all the data necessary for a receiving
queue. When this page is submitted, the data from the page is
extracted and added to the workflow XML. This XML data is passed to
the queue manager, where the queue manager checks that all the data
is filled before passing it on to the next queue. When there are no
more queues, the workflow reaches reconciliation.

A database stores all of this data. The
database module should be capable of storing the data in a logical
format, as well as working with XML. Most file storage databases
store files on a separate storage system and save a reference to
the file in a table, such as a path and filename. This is a good
approach. In order to make your solution extensible, store each
datum in a table with these fields: “queue_id”, “datum_name”,
“datum_value”, “datum_type”, “datum_length”, and “datum_prec”. This
will allow your datum values to store vertically instead of in a
concrete horizontal structure, thus, allowing an abstraction for
associated data. You can apply the same logic for index data.

When the Web server receives the starter page
request, the script (which in this case is addDocument.asp)
extracts the index values and queue data and populates the workflow
XML. The file data is extracted and stored in the <data> node
of the document XML. Then, the XML is submitted to the queue
manager, where the queue manager writes the data to the database
and stores the file. If all the appropriate information is
available, the queue manager progresses the work to the next
queue.

In my workflow XML, you see elements for
“roles”. These roles are groups whose members have access to the
particular queues. When a user logs in to the system, he gets a
work list from which he can manipulate the data. The work is
delivered in XML format, where the appropriate data can be bound to
HTML elements for editing. Once edited, the work is submitted to
the queue manager and the cycle begins again.

I didn’t include the code responsible for
handling the submission of work, etc., because each scripting
environment on the server side handles file submission differently.
Also, the database that you use determines how you manage the data.
For instance, with Microsoft SQL Server, you can store files as
BLOBs and specify that BLOB data types are stored on a separate
server. However, you won’t get the same functionality out of
MySQL.

You can view this easy example as a
PHP solution with MySQL
. Also, I encourage you to download the
source code
.

Keep your developer skills sharp by automatically signing up for TechRepublic’s free Web Development Zone newsletter, delivered each Tuesday.