Efficient Social Website Crawling Using Cluster Graph
Source: University of Colorado
Online social communities have gained significant popularity in recent years and have become an area of active research. Compared with general websites or well-structured Web forums, user-centered social websites pose several unique challenges for crawling, a fundamental task for data collection and data mining of large-scale online social communities: Social websites have more complex link structures and much higher indegree and outdegree, resulting in a large number of duplicate links; Social websites contain large amounts of duplicate content usually listed under different URLs; Social websites are interactive in nature, containing a large number of action or uninformative webpages such as login, tell-a-friend, or commenting;
| Format: | Size: | 825.40 | |
| Date: | Nov 2009 |



