Big Data

A Novel Parallel Architecture Design of Information Retrieval System for Scientific Papers

Date Added: Apr 2012
Format: PDF

Indexing allows converting raw document collection into easily searchable representation. Bigger scale indexing poses some challenges such as how to distribute indexing computation efficiently on a cluster of nodes. MapReduce framework can be an effective tool for parallelizing such tasks as inverted index construction. When performing search over the whole contents of a collection of documents, scanning them one-by-one is inefficient due to considerable response time. Usually larger collections are scanned, analyzed and indexed before making any query on them. This approach greatly reduces response time of searching.