Fault Tolerance for HPC With OpenVZ Virtualization by Lite Migration Toolkit
Source: National Center for High-Performance Computing (NCHC)
The reliability of large-scale parallel jobs within a cluster or even across multi-clusters under the Grid or distributed computing environment is a long term issue due to its difficulties involving the monitoring and managing of a large number of compute nodes. To contribute to the issue, a Lite Migration toolkit with fault tolerance feature has been developed by the Distributed Computing Team in the National Center for Highperformance Computing (NCHC). The proposed approach relies on the virtualization techniques exemplified by the OpenVZ, which is an open source implementation of virtualization. The approach provides automatically and transparently the fault tolerance capability to the parallel HPC applications.
| Format: | Size: | 2285.60 | |
| Date: | Dec 2010 |
People who downloaded this item also downloaded
- VMware vSphere 4 Fault Tolerance: Architecture and Performance
- Free Centralized, Local Unix/Linux User and Group Management and Reporting
- Fault-Tolerant and Reliable Computation in Cloud Computing
- Creating Fault-Tolerant Xen Virtualization at the Network Adapter Layer
- Fault Tolerance Middleware for Cloud Computing



