Provided by: University of Trás-os-Montes and Alto Douro
Topic: Big Data
Date Added: Aug 2013
Semi-structured data such as XML are popular for data interchange and storage. However, many XML documents have improper nesting where open- and close-tags are unmatched. Since some semi-structured data (e.g., Latex) have a flexible grammar and since many XML documents lack an accompanying DTD or XSD, the authors focus on computing a syntactic repair via the edit distance. To solve this problem, they propose a dynamic programming algorithm which takes cubic time. While this algorithm is not scalable, well-formed substrings of the data can be pruned to enable faster computation.