A Novel Parallel Architecture Design of Information Retrieval System for Scientific Papers
Indexing allows converting raw document collection into easily searchable representation. Bigger scale indexing poses some challenges such as how to distribute indexing computation efficiently on a cluster of nodes. MapReduce framework can be an effective tool for parallelizing such tasks as inverted index construction. When performing search over the whole contents of a collection of documents, scanning them one-by-one is inefficient due to considerable response time. Usually larger collections are scanned, analyzed and indexed before making any query on them. This approach greatly reduces response time of searching.