A Novel Parallel Architecture Design of Information Retrieval System for Scientific Papers

Provided by: Science and Development Network (SciDev.Net)
Topic: Big Data
Format: PDF
Indexing allows converting raw document collection into easily searchable representation. Bigger scale indexing poses some challenges such as how to distribute indexing computation efficiently on a cluster of nodes. MapReduce framework can be an effective tool for parallelizing such tasks as inverted index construction. When performing search over the whole contents of a collection of documents, scanning them one-by-one is inefficient due to considerable response time. Usually larger collections are scanned, analyzed and indexed before making any query on them. This approach greatly reduces response time of searching.

Find By Topic