An extensive study on iterative solver resilience: characterization, detection and prediction

Burcu Mutlu

Ayuda

An extensive study on iterative solver resilience: characterization, detection and prediction

Autores: Burcu Mutlu
Directores de la Tesis: Osman Unsal (dir. tes.), Gokcen Kestor (codir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2019
Idioma: español
Tribunal Calificador de la Tesis: Oriol Arcas Abella (presid.), Marc Casas Guix (secret.), Roberto Gioiosa (voc.)
Programa de doctorado: Programa de Doctorado en Arquitectura de Computadores por la Universidad Politécnica de Catalunya
Materias:
- Matemáticas
  - Análisis numérico
    - Métodos iterativos
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Arquitectura de ordenadores
    - Fiabilidad de los ordenadores
Texto completo no disponible (Saber más ...)
Resumen
- Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's behavior. This has motivated the design of an array of techniques to detect, isolate, and correct soft errors using microarchitectural, architectural, compilationbased, or application-level techniques to minimize their impact on the executing application. The first step toward the design of good error detection/correction techniques involves an understanding of an application's vulnerability to soft errors. This work focuses on silent data e orruption's effects on iterative solvers and efforts to mitigate those effects.
  
  In this thesis, we first present the first comprehensive characterizalion of !he impact of soft errors on !he convergen ce characteris tics of six iterative methods using application-level fault injection. We analyze the impact of soft errors In terms of the type of error (single-vs multi-bit), the distribution and location of bits affected, the data structure and statement impacted, and varialion with time. We create a public access database with more than 1.5 million fault injection results. We then analyze the performance of soft error detection mechanisms and present the comparalive results. Molivated by our observations, we evaluate a machine-learning based detector that takes as features that are the runtime features observed by the individual detectors to arrive al their conclusions. Our evalualion demonstrates improved results over individual detectors. We then propase amachine learning based method to predict a program's error behavior to make fault injection studies more efficient. We demonstrate this method on asse ssing the performance of soft error detectors. We show that our method maintains 84% accuracy on average with up to 53% less cost. We also show, once a model is trained further fault injection tests would cost 10% of the expected full fault injection runs.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: