Towards Automatic Data Extraction Using Tag and Value Similarity Based on Structural -Semantic Entropy
An automatic web record extraction extracts a set of objects from heterogeneous web pages based on similarity measure among objects in an automated fashion. This classifies a region in the web page according to similar data object which emerge frequently in it. This involves transformation of unstructured data into structured data that can be stored and analyzed in a central local database. The existing system develops a data extraction and alignment method known as Combining Tag and Value Similarity (CTVS), which identifies the Query Result Records (QRRs) by extracting the data from query result page and segment them. Those segmented QRRs are aligned into a table where same attribute data values are put into the same column.