Web Information Extraction Using Data Extraction and Label Assignment
Information extraction is the process of recovering structured data from formatted text like identifying fields (e.g. NER), understanding relations between fields (e.g. record association). The users get the specific information from the websites by querying, extracting and integrate the data from different web pages. The unsupervised IE system used here is DeLa (Data extraction and Label assignment) for web databases. It is designed to solve record-level extraction task. It deals with nested object extraction. To integrate data from different web sites, some information extraction systems, i.e., wrappers and agents have been developed to extract data objects from web pages based on their HTML-tag structures.