Download now Free registration required
Many websites contain genealogical data that is meaningful only to humans. How can the authors easily extract this data and make it available so that users can intelligently query genealogical pages in a multitude of different formats on thousands of different web sites? Most web pages have structures that are easily described by common patterns, in which case data extraction may be automated. They present a language - PATtern Markup Language, or PatML - and a set of tools that allow a user to specify a set of regular expressions that are subsequently used to extract and store meaningful data for machine use. They have successfully used PatML to automate the extraction of genealogical data from several web sites in different formats.
- Format: PDF
- Size: 350.9 KB