Ompss-opencl programming model for heterogeneous systems

Vinoth Krishnan Elangovan

Ayuda

Ompss-opencl programming model for heterogeneous systems

Autores: Vinoth Krishnan Elangovan
Directores de la Tesis: Eduard Ayguadé Parra (dir. tes.), Rosa M Badia (codir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2015
Idioma: español
Tribunal Calificador de la Tesis: Enrique Salvador Quintana Ortí (presid.), Xavier Martorell Bofill (secret.), Paul Matthew Carpenter (voc.)
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Informática
    - Lenguajes de programación
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Arquitectura de ordenadores
Texto completo no disponible (Saber más ...)
Resumen
- CPU architects have progressed from single core processors to Multi-core or Many-core processors in order to keep up with Moore's law. Since, CPU vendors have not been able to provide performance benefits by increasing the operating frequency, primarily due to the power, ILP and memory wall. In order to tackle these issues, designers have started to pack more homogeneous cores onto a single die. Today in the era of multi-core processors, energy efficient high performance computing has become very critical and use of GPUs as accelerators has been the trend. However, this transformation of single core computing systems into heterogeneous environments has come with a huge programming challenge. To harness this immense computing power, hardware vendors have built platform specific programming APIs like CUDA for Nvidia GPUs. However, these APIs are quite demanding to program, hence involve huge software development time. Addressing this, Khronos research consortium came up with OpenCL, an open source parallel computing API for cross platform computations. OpenCL offers a platform independent programming API which allows the developer to write applications portable across different platforms. OpenCL solves the problem of using heterogeneous systems with a single programming model, but the model as such is quite challenging to code with low level programmability and inadequate performance portability.
  
  In this doctoral thesis, we focus on addressing these shortcomings in OpenCL. We propose OmpSs-OpenCL, a task-based programming model presenting users with a sequential style of programming with added pragmas for parallel regions of the application. This would enable the programmer to skip cumbersome OpenCL constructs, instead write a sequential program with annotated pragmas. Our contribution mainly focuses on providing complete abstraction to OpenCL programming API and to exploit the best of the underlying hardware platform. OmpSs-OpenCL includes source-to-source Mercurium compiler and the Nanos++ runtime system. We have evaluated OmpSs-OpenCL model with different benchmarks and have noticed substantial ease in programming with comparable performance to original OpenCL benchmarks. With OpenCL supporting code portability across different devices, we upgrade OmpSs-OpenCL programming model by supporting parallel execution of tasks across heterogeneous platforms. We discuss the enhancement in the design of the programming model and investigate on the scalability of OmpSs-OpenCL benchmarks along with task pre-fetching. Further, we present static and work-stealing scheduling techniques. We show results of parallel execution of applications using OmpSs-OpenCL model and use heterogeneous workloads to evaluate our scheduling techniques on a heterogeneous CPU-GPU platform. Although OpenCL, offers code portability, it often fails in providing performance portability across generations. Current approach of ensuring portability requires considerable programming effort and also burdens the programmers by forcing them to be aware of the architectural changes. In order to tackle this situation, we propose an Auto-Tune tool for OmpSs-OpenCL kernels to ease the process of porting applications across different generation of GPUs. The proposed tool focuses on modifying the OpenCL kernel execution configuration based on GPU specifications and does not require any user intervention. With our experiments, we show kernel execution configuration greatly influences application performance and tuning it based on GPU characteristics is crucial.
  
  With OmpSs-OpenCL, we offer application developers with a single high-level programming platform following a sequential programming style addressing both low level programmability and performance portability issues with OpenCL. With OpenCL gaining popularity across wider sections of HPC users, OmpSs-OpenCL approach can provide the way to realize productive supercomputing.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: