Logo der Universität Wien

REPEAL: Resilience vs. Performance in linear Algebra


Abstract

Algorithmic fault tolerance is an important open problem for complex numerical algorithms. Substantial advances will have high impact in a wide spectrum of applications – ranging from sensor networks over P2P networks to high performance computing (HPC), since future HPC systems are expected to exhibit much higher fault rates than current systems do. It is an open question how much resilience can be achieved at the algorithmic level and how this influences sustained performance.

The REPEAL project (REsilience vs. PErformance in linear ALgebra) investigates resilient/fault-tolerant parallel algorithms for a range of numerical linear algebra problems. The objectives are to design algorithms which provably produce accurate results (within the limitations of floating-point arithmetic) in the presence of faults, and to gain a better understanding of the resilience-performance trade-off.

We address the following questions: How can existing approaches be improved to handle more general temporal and spatial fault distributions? Which numerical accuracy and sustained performance do resilient algorithms achieve in real computations? Which resilience improvements can be achieved by combining deterministic with randomized approaches? What is the "price" of resilience, i.e., which slow-down has to be expected compared to non-resilient high performance algorithms? 

Further details
Type
R&D projects, public funding
Duration
2016 - 2019
Executive
Research Group Theory and Applications of Algorithms
Links
https://repeal.taa.univie.ac.at/
Contact us
Faculty of Computer Science
University of Vienna

Währinger Straße 29
A-1090 Vienna