Resiliency-Aware Data Management
Computing architectures change towards massively parallel environments with increasing numbers of heterogeneous components. The large scale in combination with decreasing feature sizes leads to dramatically increasing error rates. The heterogeneity further leads to new error types. Techniques for ensuring resiliency in terms of robustness regarding these errors are typically applied at hardware abstraction and operating system levels. However, as errors become the normal case, the authors observe increasing costs in terms of computation overhead for ensuring robustness.