Developer

Resolve custom XML entities with SAX and Java

You may need to describe the vernacular of a document in an abstract and reusable way. One solution is to use custom XML entities to represent common, reusable XML components. Here's how.

This article originally appeared as an XML e-newsletter.

By Brian Schaffner

You may find that your XML documents and their various grammars take on a vernacular of their own. As this vernacular finds its way into your documents, you might need to describe it in an abstract and reusable way. One solution is to use custom XML entities to represent common, reusable XML components.

XML entities

Entity is a rather ambiguous term, and it's used in seemingly vague application to refer to a specific category of artifacts found in XML documents.

An entity is a piece of named data found within an XML document. There are several entities you may already be familiar with, such as &amp (which is an entity that represents an ampersand). This entity causes all sorts of problems when it isn't used properly.

Parsing entities

There are three ways to categorize XML entities, as shown in Table A. Entities are referred to within XML documents using "entity references," which is basically an ampersand (&), followed by the entity name, followed by a semi-colon (;). In the above example, we cited &amp as an entity—the name of the entity is amp, and the data it refers to is an ampersand character.

The XML parser will try to resolve entity references to their replacement text. Part of this process uses the DTD to find the definitions for internal parsed entities. It's possible that your entities may not be defined but simply referenced in the DTD; in which case, they are external entities. As SAX engines parse your documents, they may need to resolve these external entities, and you can intercept this process using the EntityResolver interface.

EntityResolver

The EntityResolver interface is remarkably simple and easy to use. The interface consists of a single method called resolveEntity, which takes two parameters: the public identifier and the system identifier for the entity. These identifiers are supplied by the entity definition in the DTD, as shown in our example in Listing A.

Listing A: entity.dtd
<!ENTITY MyCustomEntity
         PUBLIC "-//Builder.com//TEXT MyCustomEntity//EN"
         "http://www.builder.com/xml/entities/MyCustomEntity">

<!ELEMENT CustomEntity (#PCDATA)>
<!ELEMENT Entity (CustomEntity)>

Our sample XML document is shown in Listing B. This document illustrates the use of the DTD for validation and shows the MyCustomEntity being used as the value for the CustomEntity element.

Listing B: entity.xml
<?xml version="1.0" ?>
<!DOCTYPE entity SYSTEM "entity.dtd">
<Entity>
  <CustomEntity>&MyCustomEntity;</CustomEntity>
 </Entity>

In order to process this entity using our custom resolver, we'll need to code a SAX parser, a handler for the SAX parser, and an EntityResolver. The EntityResolver class is shown in Listing C.

Listing C: CustomResolver.java
import java.io.StringReader;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;

public class CustomResolver implements EntityResolver {
  public InputSource resolveEntity (String publicId, String systemId) {
    StringReader strReader = new StringReader("This is a custom
 entity");
    if
(systemId.equals("http://www.builder.com/xml/entities/MyCus
tomEntity")) {
       System.out.println("Resolving entity: " + publicId);
       return new InputSource(strReader);
     } else {
       return null;
     }
   }
 }

The EntityResolver interface is quite simple. The resolveEntity method simply looks at the supplied public and system identifiers and returns an InputSource that points to the value for the entity. Using an InputSource allows you to provide a simple string value via StringReader (as we've done), or to use something more sophisticated.

Our handler is called MySAXHandler and is shown in Listing D. Listing E shows our example run program called EntityResolverExample, which also implements our SAX parser via the XMLReader interface. We've dramatically simplified the SAX handler; it contains a bare-bones implementation of the ContentHandler interface that will only show the start and stop of each element and the associated character data.

Listing D: MySAXHandler.java
import org.xml.sax.*;
import java.io.*;

public class MySAXHandler implements ContentHandler {

  public void setDocumentLocator(Locator locator) {}
  
  public void startDocument() throws SAXException {}
  public void endDocument() throws SAXException {}
  
  public void startPrefixMapping(String prefix, String uri)
 throws SAXException {}
  
  public void endPrefixMapping(String prefix) throws
 SAXException {}
  
  public void startElement(String namespaceURI, String
 localName, String qualifiedName, Attributes atts) throws
 SAXException {
    System.out.println("Starting element: " + localName);
  }
  
  public void endElement(String namespaceURI, String
 localName, String qualifiedName) throws SAXException {
    System.out.println("Ending element: " + localName);
  }
  
  public void characters(char[] text, int start, int
 length) throws SAXException {
    String data = new String(text);
    System.out.println(data.substring(start, start +
 length));    
  }
  
  public void ignorableWhitespace(char[] text, int start,
 int length) throws SAXException {}
  
  public void processingInstruction(String target, String
 data) throws SAXException {}
   
  public void skippedEntity(String name) throws
 SAXException {}
}

Listing E: EntityResolverExample.java
import org.xml.sax.*;
import org.xml.sax.helpers.*;

public class EntityResolverExample {
  public static void main (String[] args) {
    XMLReader parser;
    MySAXHandler msh;
    CustomResolver myResolver = new CustomResolver();
    try {
      parser = XMLReaderFactory.createXMLReader();
      msh = new MySAXHandler();
      parser.setContentHandler(msh);
      parser.setEntityResolver(myResolver);
      parser.parse("entity.xml");
    } catch (Exception e) {
      System.out.println (e);
    }
  }
}

The EntityResolverExample class uses the XMLReaderFactory to create a new SAX parser using the XMLReader interface. We then set the content handler to our custom content handler and the entity resolver to our custom entity resolver. Finally, we parse the XML document and see the names of the elements and the value for our externally resolved entity, as shown below:

Starting element: Entity
Starting element: CustomEntity
Resolving entity: -//Builder.com//TEXT MyCustomEntity//EN
This is a custom entity
Ending element: CustomEntity
Ending element: Entity

Brian Schaffner is an associate director for Fujitsu Consulting. He provides architecture, design, and development support for Fujitsu's Technology Consulting practice.

Editor's Picks