Generating, Sampling and Counting Subclasses of Regular Tree Languages
To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), the authors need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, they provide the formal foundation for such a testbed. They adopt similarity measures based on counting the number of common and different trees in the two languages, and they develop the necessary machinery for computing them. They use the formalism of Extended DTDs (EDTDs) to represent the unranked regular tree languages.