Web Data Commons - Extracting Structured Data from Two Large Web Corpora

Download Now
Provided by: RWTH Aachen University
Topic: Data Management
Format: PDF
More and more websites embed structured data describing for instance products, people, organizations, places, events, resumes, and cooking recipes into their HTML (HyperText Markup Language) pages using encoding standards such as microformats, microdata and RDFa (Resource Description Framework). The Web data commons project extracts all microformat, microdata and RDFa data from the common crawl Web corpus, the largest and most up to data web corpus that is currently available to the public and provides the extracted data for download in the form of RDF-quads.
Download Now

Find By Topic