Business Intelligence

Efficient Social Website Crawling Using Cluster Graph

Free registration required

Executive Summary

Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: Social websites have more complex link structures and much higher indegree and outdegree, resulting in a large number of duplicate links; Social websites contain large amounts of duplicate content usually listed under different URLs; Social websites are interactive in nature, containing a large number of action or uninformative webpages such as login, tell-a-friend, or commenting;

  • Format: PDF
  • Size: 825.4 KB