dFault: Fault Localization in Large-Scale Peer-to-Peer Systems

Distributed Hash Tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more important as mission-critical applications begin to be layered on them. Even though DHTs can detect and heal around unresponsive hosts and disconnected links, several hidden faults and performance bottlenecks go undetected, resulting in unanswered queries and delayed responses. In this paper, the authors propose dFault, a system that helps large-scale DHTs to localize such faults.

Provided by: Purdue University Topic: Collaboration Date Added: Sep 2010 Format: PDF

Find By Topic