Date Added: Dec 2012
Web harvesting (Web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the Web by either implementing low-level Hyper-Text Transfer Protocol (HTTP), or embedding certain full-fledged Web browsers. Web harvesting is closely related to Web indexing, which indexes information on the Web using a bot. In contrast, Web harvesting focuses more on the transformation of unstructured data on the Web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web harvesting is also related to Web automation, which simulates human Web browsing using computer software.