Ayuda
Ir al contenido

Dialnet


High-performance and energy-efficient irregular graph processing on gpu architectures

  • Autores: Albert Segura Salvador
  • Directores de la Tesis: José María Arnau Montañés (dir. tes.), Antonio González Colás (codir. tes.)
  • Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2021
  • Idioma: español
  • Tribunal Calificador de la Tesis: Tor Michael Aamodt (presid.), Jordi Tubella Murgadas (secret.), Julio Sahuquillo Borras (voc.)
  • Programa de doctorado: Programa de Doctorado en Arquitectura de Computadores por la Universidad Politécnica de Catalunya
  • Materias:
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Graph processing is an established and prominent domain that is the foundation of new emerging applications in areas such as Data Analytics and Machine Learning, empowering applications such as road navigation, social networks and automatic speech recognition. The large amount of data employed in these domains requires high throughput architectures such as GPGPU. Although the processing of large graph-based workloads exhibits a high degree of parallelism, memory access patterns tend to be highly irregular, leading to poor efficiency due to memory divergence.In order to ameliorate these issues, GPGPU graph applications perform stream compaction operations which process active nodes/edges so subsequent steps work on a compacted dataset. We propose to offload this task to the Stream Compaction Unit (SCU) hardware extension tailored to the requirements of these operations, which additionally performs pre-processing by filtering and reordering elements processed.We show that memory divergence inefficiencies prevail in GPGPU irregular graph-based applications, yet we find that it is possible to relax the strict relationship between thread and processed data to empower new optimizations. As such, we propose the Irregular accesses Reorder Unit (IRU), a novel hardware extension integrated in the GPU pipeline that reorders and filters data processed by the threads on irregular accesses improving memory coalescing.Finally, we leverage the strengths of both previous approaches to achieve synergistic improvements. We do so by proposing the IRU-enhanced SCU (ISCU), which employs the efficient pre-processing mechanisms of the IRU to improve SCU stream compaction efficiency and NoC throughput limitations due to SCU pre-processing operations. We evaluate the ISCU with state-of-the-art graph-based applications achieving a 2.2x performance improvement and 10x energy-efficiency.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno