Date Added: Jan 2013
A distributed scenario can be of two types: homogeneous - where only a fraction of each feature is observed at every site or heterogeneous - where only some of the features are observed at each site. For either scenario centralizing all the data in order to build a global model is not an appropriate solution due to the high cost of centralizing and storage requirement at the central node. Therefore, distributed algorithms are required to solve most data mining problems in p2p networks. In general, a distributed algorithm in this setting should not require global synchronization, be communication efficient, and be resilient to moderate changes in the network topology.