Provided by: Oak Ridge National Laboratory
Date Added: Jun 2010
The authors present a monitoring system for large-scale parallel and distributed computing environments that allows to trade-off accuracy in a tunable fashion to gain scalability without compromising fidelity. The approach relies on classifying each gathered monitoring metric based on individual needs and on aggregating messages containing classes of individual monitoring metrics using a tree-based overlay network. The MRNet-based prototype is able to significantly reduce the amount of gathered and stored monitoring data e.g., by a factor of 56 in comparison to the ganglia distributed monitoring system.