Supporting Component-Based Failover Units in Middleware for Distributed Real-Time and Embedded Systems

Executive Summary

Although component middleware is increasingly used to develop Distributed, Real-time and Embedded (DRE) systems, it poses new fault-tolerance challenges, such as the need for efficient synchronization of internal component state, failure correlation across groups of components, and configuration of fault-tolerance properties at the component granularity level. This paper makes three contributions to R&D on component-based fault-tolerance. It describes the COmponent Replication based on Failover Units (CORFU) component middleware, which provides fail-stop behavior and fault correlation across groups of components treated as an atomic unit in DRE systems.

