Processing XML Streams With Deterministic Automata
Source: University of Washington
This paper considers the problem of evaluating a large number of XPath expressions on an XML stream. The main contribution consists in showing that Deterministic Finite Automata (DFA) can be used effectively for this problem: in the experiments the author achieves a throughput of about 5.4MB/s, independent of the number of XPath expressions (up to 1,000,000 in the tests). The major problem the paper faces is that of the size of the DFA. Since the number of states grows exponentially with the number of XPath expressions, it was previously believed that DFAs cannot be used to process large sets of expressions.