Query-Log Mining for Detecting Polysemy and Spam
Source: University of Pisa
Through their interaction with search engines, users provide implicit feedback that can be used to extract useful knowledge and improve the quality of the search process. This feedback is encoded in the form of a query log that consists of a sequence of search actions, which contain information about submitted queries, documents viewed, and documents clicked by the users. This paper proposes characterizing documents and queries via the information available within a query log, with the goal of detecting either query polysemy or spam-hosts and spam-queries, i.e., queries that shown the undesirable property of showing a higher rate of spam pages in their list of results than other queries.