Logo der Universität Wien

REPEAL: Resilience vs. Performance in linear Algebra

Table of content

Further details
R&D projects, public funding
2016 - 2019
Research Group Theory and Applications of Algorithms

Algorithmic fault tolerance is an important open problem for complex numerical algorithms. Substantial advances will have high impact in a wide spectrum of applications – ranging from sensor networks over P2P networks to high performance computing (HPC), since future HPC systems are expected to exhibit much higher fault rates than current systems do. It is an open question how much resilience can be achieved at the algorithmic level and how this influences sustained performance.

The REPEAL project (REsilience vs. PErformance in linear ALgebra) investigates resilient/fault-tolerant parallel algorithms for a range of numerical linear algebra problems. The objectives are to design algorithms which provably produce accurate results (within the limitations of floating-point arithmetic) in the presence of faults, and to gain a better understanding of the resilience-performance trade-off.

We address the following questions: How can existing approaches be improved to handle more general temporal and spatial fault distributions? Which numerical accuracy and sustained performance do resilient algorithms achieve in real computations? Which resilience improvements can be achieved by combining deterministic with randomized approaches? What is the "price" of resilience, i.e., which slow-down has to be expected compared to non-resilient high performance algorithms? 

Grafik Top

Theory and Applications of Algorithms

Faculty of Computer Science
University of Vienna
Währinger Straße 29
A-1090 Vienna

F +43-1-4277-9 783

Universität Wien | Universitätsring 1 | 1010 Wien | T +43-1-4277-0