CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems

Source: Indiana University

Favorite

Free registration required

Considerable work has been done on providing fault tolerance capabilities for different software components on large scale high-end computing systems. Thus far, however, these fault tolerant components have worked insularly and independently and information about faults is rarely shared. Such lack of system-wide fault tolerance is emerging as one of the biggest problems on leadership-class systems. This paper proposes a coordinated infrastructure, named CIFTS that enables system software components to share fault information with each other and adapt to faults in a holistic manner.
Format:PDF Size:237.50
Date:Jun 2009