Provided by: International Journal of Computer & Organization Trends(IJCOT)
Topic: Data Management
In this paper, the authors propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Their Combined Component Approach (CCA) is based on the combination of several term-weighting components (i.e., term frequency, collection frequency and normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in their CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that their approach was able to outperform standard TFIDF, BM25 and another GP-based approach in two different collections.