A Hybrid Approach for Estimating Document Frequencies in Unstructured P2P Networks
Scalable search and retrieval over numerous web document collections distributed across different sites can be achieved by adopting a Peer-To-Peer (P2P) communication model. Terms and their document frequencies are the main components of text information retrieval and as such need to be computed, aggregated, and distributed throughout the system. This is a challenging problem in the context of unstructured P2P networks, since the local document collections may not reflect the global collection in an accurate way. This might happen due to skews in the distribution of documents to peers.