From XML to XML: The Why and How of Making the Biodiversity Literature Accessible to Researchers

This paper presents the ABLE document collection, which consists of a set of annotated volumes of the Bulletin of the British Museum (Natural History). These were developed during the ongoing work on automating the markup of scanned copies of the biodiversity literature. Such automation is required if historic literature is to be used to inform contemporary issues in biodiversity research. This paper considers an enhanced TEI XML markup language, which is used as an intermediate stage in translating from the initial XML obtained from Optical Character Recognition to taXMLit, the target annotation schema. The intermediate representation allows additional information from external sources such as a taxonomic thesaurus to be incorporated before the final translation into taXMLit.

Provided by: Open University Topic: Software Date Added: May 2010 Format: PDF

Download Now

Find By Topic