Unstructured Data Integration through Automata-Driven Information Extraction

Provided by: International Journal of Computer Science and Business Informatics
Topic: Big Data
Format: PDF
Extracting information from plain text and restructuring them into relational databases raise a challenge as how to locate relevant information and update database records accordingly. In this paper, the authors propose a wrapper to efficiently extract information from unstructured documents, containing plain text expressed with natural-like language. Their extraction approach is based on the automata formalism to describe the wrapping process running from text documents to databases. As usual, relevant information in the text document is delimited by regular expressions, which define the extracting automaton.

Find By Topic