Identifying Spam Web Pages Based on Content Similarity
Source: Springer Science+Business Media
The Web provides its users with abundant information. Unfortunately, when a Web search is performed, both users and search engines are faced with an annoying problem: the presences of misleading Web pages, i.e., spam Web pages, that are ranked among legitimate Web pages. The mixed results downgrade the performance of search engines and frustrate users who are required to filter out useless information. In order to improve the quality of Web searches, the number of spam pages on the Web must be reduced, if they cannot be eradicated entirely.