Energy-efficient architectures for recurrent neural networks

Franyell Antonio Silfa Feliz

Ayuda

Energy-efficient architectures for recurrent neural networks

Autores: Franyell Antonio Silfa Feliz
Directores de la Tesis: José María Arnau Montañés (dir. tes.), Antonio González Colás (codir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2021
Idioma: español
Tribunal Calificador de la Tesis: Tushar Krishna (presid.), Enrique Pastor Llorens (secret.), José Cano Reyes (voc.)
Programa de doctorado: Programa de Doctorado en Arquitectura de Computadores por la Universidad Politécnica de Catalunya
Materias:
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Arquitectura de ordenadores
Texto completo no disponible (Saber más ...)
Resumen
- Deep Learning algorithms have been remarkably successful in applications such as Automatic Speech Recognition and Machine Translation. Thus, these kinds of applications are ubiquitous in our lives and are found in a plethora of devices. These algorithms are composed of Deep Neural Networks (DNNs), such as Convolutional Neural Networks and Recurrent Neural Networks (RNNs), which have a large number of parameters and require a large amount of computations. Hence, the evaluation of DNNs is challenging due to their large memory and power requirements.
  
  RNNs are employed to solve sequence to sequence problems such as Machine Translation. They contain data dependencies among the executions of time-steps hence the amount of parallelism is severely limited. Thus, evaluating them in an energy-efficient manner is more challenging than evaluating other DNN algorithms.
  
  This thesis studies applications using RNNs to improve their energy efficiency on specialized architectures. Specifically, we propose novel energy-saving techniques and highly efficient architectures tailored to the evaluation of RNNs. We focus on the most successful RNN topologies which are the Long Short Term memory and the Gated Recurrent Unit.
  
  First, we characterize a set of RNNs running on a modern SoC. We identify that accessing the memory to fetch the model weights is the main source of energy consumption. Thus, we propose E-PUR: an energy-efficient processing unit for RNN inference. E-PUR achieves 6.8x speedup and improves energy consumption by 88x compared to the SoC. These benefits are obtained by improving the temporal locality of the model weights.
  
  In E-PUR, fetching the parameters is the main source of energy consumption. Thus, we strive to reduce memory accesses and propose a scheme to reuse previous computations. Our observation is that when evaluating the input sequences of an RNN model, the output of a given neuron tends to change lightly between consecutive evaluations.Thus, we develop a scheme that caches the neurons' outputs and reuses them whenever it detects that the change between the current and previously computed output value for a given neuron is small avoiding to fetch the weights. In order to decide when to reuse a previous value we employ a Binary Neural Network (BNN) as a predictor of reusability. The low-cost BNN can be employed in this context since its output is highly correlated to the output of RNNs. We show that our proposal avoids more than 24.2% of computations. Hence, on average, energy consumption is reduced by 18.5% for a speedup of 1.35x.
  
  RNN models’ memory footprint is usually reduced by using low precision for evaluation and storage. In this case, the minimum precision used is identified offline and it is set such that the model maintains its accuracy. This method utilizes the same precision to compute all time-steps.Yet, we observe that some time-steps can be evaluated with a lower precision while preserving the accuracy. Thus, we propose a technique that dynamically selects the precision used to compute each time-step. A challenge of our proposal is choosing a lower bit-width. We address this issue by recognizing that information from a previous evaluation can be employed to determine the precision required in the current time-step. Our scheme evaluates 57% of the computations on a bit-width lower than the fixed precision employed by static methods. We implement it on E-PUR and it provides 1.46x speedup and 19.2% energy savings on average.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: