Towards Data Mining in Large and Fully Distributed Peer-to-Peer Overlay Networks
Source: Vrije Universiteit
The Internet, which is becoming a more and more dynamic, extremely heterogeneous network has recently became a platform for huge fully distributed peer-to-peer overlay networks containing millions of nodes typically for the purpose of information dissemination and file sharing. This paper targets the problem of analyzing data which are scattered over a such huge and dynamic set of nodes, where each node is storing possibly very little data but where the total amount of data is immense due to the large number of nodes. The paper presents distributed algorithms for effectively calculating basic statistics of data using the recently introduced newscast model of computation and the paper demonstrates how to implement basic data mining algorithms based on these techniques.