Data Management

Confluence: A System for Lossless Multi-Source Single-Sink Data Collection

Executive Summary

Distributed environments often require collection of large amounts of critical and raw data from multiple locations to a central clearinghouse, e.g., task results or large datasets from multiple clouds, logs from multiple PlanetLab nodes, video transcripts in tele-immersive settings, etc. The authors present the design, implementation and evaluation of Confluence, a system for rapid and lossless transfer of unique files from multiple source nodes to a single sink node. First, they formally model the multi-source single-sink data collection problem for a static network and present an optimal solution in terms of total transfer time. Second, they build in mechanisms to make the system workable in dynamic networks.

  • Format: PDF
  • Size: 284.9 KB