It's possible to store and retrieve XML documents in a relational database even if that database does not boast XML features. The secret lies in careful use of Java (the database does have to be JDBC-compliant) and a third-party middleware framework named XML-DBMS, which acts as the glue between the relational database and the XML document flow.
In general, there are two common approaches to mapping XML to databases: a table-based mapping and an object-relational (object-based) mapping. Both are commonly used as the basis for software that transfers data between XML documents and databases, especially relational databases. An important characteristic in this respect is that both methods are bidirectional, allowing XML documents to be both stored and retrieved.
Table-based mapping is the most obvious mapping. It views the document as a single table or a set of tables, and column data can be represented either as PCDATA-only elements or as attributes. Table-based mapping uses a document structure similar to this:
<Row ColumnName0="Value0" ColumnNameN-1="ValueN-1" >
. . .
. . .
. . .
This approach models pure data without a document and is similar to the relational model. However, it works only with a limited subset of XML documents.
With object-based mapping, XML-DBMS maps the XML document to the database according to an object-relational scheme in which element types are generally viewed as classes, and attributes and PCDATA are treated as properties of those classes. This models the XML document as a tree of objects that are specific to the data in the document and then maps these objects to the database. The document structure can be similar to this:
Various mapping approaches are beyond the scope of this article, so I will use the above example as the basis for the XML-to-DBMS transferring application.
Writing the map file
The map file is a special XML file in which the user specifies how XML elements, attributes, and PCDATA are mapped to the database tables and columns. Listing A contains a map file for this XML example, and Listing B contains a SQL schema for the database. The Options element in Listing A contains some system-specific parameters. In this case, you set the format for converting data from DATE type database fields to XML data and vice versa. Note that the Pattern attribute must conform to the java.text.SimpleDateFormat pattern specification.
The Databases element in Listing A contains a relational schema for a database. This schema is needed by XML-DBMS to correctly map the data and compile the map file. (Actually, when the same database schema is used in many map-files, it is a good idea to include schema as an XML external entity in each file instead of rewriting it several times.)
The Default database name means that the database must be explicitly indicated in the JDBC connection URL. In each table, you must provide primary key information, and each key must be present in the database. Otherwise, you will immediately get a map compiler exception.
In the KeyGenerator, you provide the name of the primary key generator, but you must implement this generator in a separate Java class. The KeyGenerator is simply an implementation of the Java interface that generates a unique key value every time you perform an INSERT operation. You can also use a simple, ready-to-use generator.
The XMLDBMSKey table in Listing B contains values used by the KeyGenerator. The UseColumn element points to an element acting as a primary key and shows the program where to write a generated key value. Unless a name is indicated by the Name attribute, the primary key implicitly gets the name PrimaryKey. There can be several columns used in the primary key, and the KeyGenerator implementation must know how many keys it should generate.
The ForeignKey element in Listing A is used for describing primary-foreign key relationships between element classes.
The main interest is the Maps section of a map file. Element types with element content are usually viewed as classes and mapped to a table. This is done by a ClassMap element and its child element, ElementType, which contains the name of the element being mapped and the ToClassTable element, which contains the name of the table the element is being mapped to.
The PropertyMap element controls "simple" (PCDATA in a child node) element or attribute mapping, though in this case, the application maps only elements. The RelatedClass element allows us to encapsulate another element class here, but that class must be defined in another ClassMap element.
Map documents are compiled into XMLDBMap Java objects, and the resulting object is passed to DBMSToDOM, DOMToDBMS, or DBMSDelete objects. These are special objects that do all the work regarding selecting, inserting, updating, or deleting data.
Writing filter and action files
A filter document consists of a set of filters to be applied to the values in the database. This makes it possible to filter the rows retrieved by SELECT operations or deleted by DELETE operations. The filter language is layered on the top of the mapping language. That is, the filter document allows you to specify conditions under which data is retrieved, while the mapping language provides the structural information (i.e., joins). You can think of the combination of mapping language and filter language as providing a simple query language over the database, with results returned as XML. Filter documents are compiled into FilterSet Java objects, and the resulting object is passed to DBMSToDOM or DBMSDelete objects.
An action document consists of a set of actions to be applied to the values in an XML document. Actions are specified for element types that are mapped as classes. This translates into inserting, updating, and deleting rows. An action document that specifies that rows are to be inserted or updated cannot also specify that rows are to be deleted and vice versa. Action documents that specify that rows are to be inserted or updated are used by DOMToDBMS objects. Action documents that specify that rows are to be deleted are used by DBMSDelete objects.
Listing C demonstrates insert/update actions. This code will update a row with a known primary key (if it exists) or generate a new primary key where one doesn't exist. Actually, you need insert elements for two classes, order and cargo, because the cargo element is nested in the order element.
For selecting or deleting data, you will need the filter file contained in Listing D. The Options element contains a special Wrapper element, which is used to wrap the results of the query. The Wrapper element is needed when more than one element can be retrieved from the database. In such cases, you will produce multirooted XML structures, which are not supported by many standards.
RootFilterelements are used to specify the values retrieved from the root tables.
Writing Java code
Now you are ready to write your own application that stores and retrieves DOM documents from a relational database. Listing E contains sample Java code, which bears closer examination.
First of all, you must instantiate a data source and a data handler. The JDBC1DataSource class is an implementation of JDBC 2.0 DataSource for a JDBC1-compliant driver and supports connection pooling. Datahandler is an interface for abstracting database access. You also tell the JVM to run finalizers on exit with System.runFinalizersOnExit(true). This is necessary to ensure that database connections are properly closed. ParserUtils is an interface for a class that implements parser-specific methods. It is needed because such operations as XML parsing and serializing needs to be a little customized for each parser.
You must compile and instantiate a Map object, Actions object, and FilterSet object, all of which will be used in the document transfer process. This is accomplished using appropriate methods of MapCompiler, ActionCompiler, and FilterCompiler objects. A TransferInfo object contains information needed to transfer data between an XML document and the database. It encapsulates the mapping metadata and DataHandler objects needed to transfer data between an XML document and the database according to a particular map. It contains a single XMLDBMSMap object and one DataHandler object per database. You must also create a KeyGenerator-implementing object for generating primary keys when new objects will be inserted. Keys are used to join tables (class table to class table or class table to property table) and also to retrieve data from root tables.
Finally, you can create a DOMToDBMS object, which will transfer data to the database from a DOM tree. In a similar fashion, you can create a DBMSToDOM object for getting relational data back into DOM.
You have demonstrated that storing XML data is not very difficult with XML-DBMS. This approach is recommended when you already have a relational database infrastructure or want to establish database vendor independence. Since XML-DBMS does not need any specific database, it must simply understand standard SQL and have a JDBC driver (or ODBC driver with a bridge). This framework can be used in applications that want to search across or merge information from diverse sources, because relational database schemas can be easily built by XML DTDs and XML Schemas, and there are tools for converting DTDs and Schemas to map-files. It is also possible to build data-driven applications such as a CMS or CRM system. It is a good example of an XML-to-DBMS middleware product that can be useful in integration of XML-featured and non-XML-featured systems.