Data Management

Combine disparate data systems with XML

If you need to combine two or more data sources into one logical data source, you can do it easily with XML. Phillip Perkins presents an example that puts this concept into action.

One of the coolest things about XML is that it's a platform-independent method of representing data. What's even neater is that if you need to combine two or more data sources into one logical data source, you can do it easily with XML.

An example I encounter frequently is combining DB2 data and Microsoft SQL data. Your new application will gather data and store support data on SQL 2000. However, any user-specific data (such as human resource data) is stored in DB2. This can cause a problem if you need to collect information from both data sources. But, with XML, all you need to do is create identical XML structures from both data sources and combine them. This is particularly helpful if you're going to run this information through an XSLT transformation.

For example, say you're collecting user information that cannot be stored in another database because this information may change on the other database. But you need to access it in order to provide a report for information that you collected in your application database, and the data has to be up to date. This could be the user's address, which is stored in a DB2 database. The information you collected in your application is stored on a SQL 2000 database. Your report needs to collect the data from both sources and combine them to create the report.

Within a single database, you normally relate information through a unique identifier. You can use this unique identifier to query for information within another database. So, instead of storing address information again on another database, keep the unique identifier on your secondary database and use it to get the information from your primary database.

This unique identifier can be an employee ID. You know that you can extract the user's correct address from the DB2 database with the employee ID, so you only store the employee's ID in the SQL 2000 database. With a little planning, you can create a system that will allow you to combine the data and transform it into your final report.

Let's say that we have a DB2 table called LOCATIONS that stores employees' work addresses. This address information is related to a LOCATION_ID on the EMPLOYEES table (where the vital employee information is stored). On our SQL 2000 database, we collect the user's favorite color in a table called FAV_COLOR. The FAV_COLOR table has three fields: name, emp_id, and fav_color (we'll assume that storing the name is safe). To access these disparate database servers, we have two objects called DB2 and SQL. Both objects are the same type, except with different connections. The class for these objects contains two methods: RunSPReturnXML and RunSQLReturnXML. The first method runs a stored procedure and returns XML data; the second method runs a SQL query string and returns XML data.

In order to avoid using a particular scripting technology for this method, I'm going to compose this class with pseudocode.

Class Data

Method RunSPReturnXML(stored_procedure, params)
    Send stored_procedure to database with params.
    If database doesn't support XML output, take recordset and
        convert it to XML.
    Return XML.
End Method

Method RunSQLReturnXML(sql, params)
    Send sql to database with params.
    If database doesn't support XML output, take recordset and
        convert it to XML.
    Return XML.
End Method

Class_Initialize
    Instantiate needed objects such as database connection tools.
    Connect to database.
End Class_Initialize

Class_Terminate
    Disconnect from database.
    Clean up instantiated objects and end.
End Class_Terminate

End Class

In order to utilize this class to combine the data for the report, we'll provide one report per employee by passing the employee ID to DB2 and SQL 2000 (again, I'm using pseudocode):

SQL = new Data
DB2 = new Data
DOMSQL = new DOMDocument
DOMDB2 = new DOMDocument
employee_id = 999999

DOMSQL.loadXML(SQL.RunSPReturnXML("get_fav_color", (employee_id)))
DOMDB2.loadXML(DB2.RunSQLReturnXML("SELECT LOCATIONS.ADDRESS1,
LOCATIONS.ADDRESS2, LOCATIONS.CITY, LOCATIONS.STATE, LOCATIONS.ZIP FROM
EMPLOYEES INNER JOIN LOCATIONS ON EMPLOYEES.LOCATION_ID = LOCATIONS.LOCATION_ID
WHERE EMPLOYEES.EMP_ID = ?", (employee_id)))

We'll assume that your XML from SQL will be in the following format:

<FAV_COLOR>
    <name>John Q. Public</name>
    <emp_id>999999</emp_id>
    <fav_color>blue</fav_color>
</FAV_COLOR>

We'll also assume that the DB2 XML data will look like this:

<LOCATION>
    <ADDRESS1>. . .</ADDRESS1>
    <ADDRESS2>. . .</ADDRESS2>
    <CITY>. . .</CITY>
    <STATE>. . .</STATE>
    <ZIP>. . .</ZIP>
</LOCATION>

To combine these two, use this line of code:

DOMSQL.appendChild(DOMDB2.selectSingleNode("//LOCATION"))

This creates the following final structure:

<FAV_COLOR>
    <name/>
    <emp_id/>
    <fav_color/>
    <LOCATION>
        <ADDRESS1/>
        <ADDRESS2/>
        <CITY/>
        <STATE/>
        <ZIP/>
    </LOCATION>
</FAV_COLOR>

The complete DOMDocument can be run through an XSLT transformation. This solution allows you to combine the data to get the absolute result you want. Also, you'll find that the DOM methods allow for easy manipulation of the data. The methods in DOM are what make combining data from disparate systems so simple.

Keep your developer skills sharp by automatically signing up for TechRepublic's free Web Development Zone newsletter, delivered each Tuesday.

0 comments

Editor's Picks