A Decomposition-Based Probabilistic Framework for Estimating the Selectivity of XML Twig Queries
In this paper the authors present a novel approach for estimating the selectivity of XML twig queries. Such a technique is useful for approximate query answering as well as for determining an optimal query plan, based on said estimates, for complex queries. The approach relies on summary structure that contains occurrence statistics of small twigs. The authors then present a novel probabilistic approach for decomposing larger twig queries into smaller ones. They then show how in conjunction with the summary information it can be used to estimate the selectivity of the larger query. They present and evaluate two approaches for decomposition and compare this work against a state-of-the-art selectivity estimation approach on synthetic and real datasets.