Dynamic load balancing for hybrid applications

Marta Garcia Gasulla

Ayuda

Dynamic load balancing for hybrid applications

Autores: Marta Garcia Gasulla
Directores de la Tesis: Julita Corbalán González (dir. tes.), Jesús José Labarta Mancho (codir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2017
Idioma: español
Tribunal Calificador de la Tesis: Matthias Müller (presid.), José R. Herrero (secret.), Juan José Gracia Calvo (voc.)
Materias:
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Arquitectura de ordenadores
Enlaces
- Tesis en acceso abierto en: TDX
Resumen
- In this thesis, we present a framework to improve the efficient use of the computational resources of a node. This framework called DLB (Dynamic Load Balancing Library) includes a novel load balancing algorithm: LeWI (Lend When Idle).
  
  LeWI is based on the idea that the computational resources of an MPI process waiting in an MPI blocking call are idle. Therefore, this resources can be used to speed up the execution of another process running on the same node.
  
  DLB will intercept the MPI calls and change the number of OpenMP threads when necessary. When a process reaches a blocking MPI call, it will lend its CPUs to the system, and another MPI process running on the same node will be able to use those CPUs. When the MPI process finishes the blocking call, it will retrieve its resources.
  
  We have implemented the LeWI algorithm within DLB and evaluated its performance. With this evaluation, we have demonstrated that DLB and LeWI can improve the performance of hybrid applications. It can solve regular or irregular imbalances without modifying the application. Moreover, when used with well balanced applications, it does not introduce a significant overhead.
  
  We detected that the malleability of the programming model or the application can affect the performance of the load balancing algorithm. Although OpenMP is malleable, it has a limitation, the number of threads can only be changed outside a parallel region. The programming model OmpSs have a higher malleability than OpenMP because the number of threads can be changed at any point. The performance evaluation showed that the malleability of the programming model has a substantial impact on the performance of the load balancing algorithm.
  
  The default placement of MPI processes among nodes is sequential, but it is common in scientific applications that the most loaded processes are consecutive. For this reason using a Round Robin distribution of MPI processes among nodes offers more potential to the load balancing mechanism. The evaluation showed that depending on the MPI application and its load distribution the Round Robin placement is a good option when using LeWI.
  
  Finally, we observed that the binding of threads to cores also affect the performance of the application and especially when using the load balancing mechanism. To allow a management of threads bound to cores, we modified LeWI to work with masks of CPUs. We evaluated the impact of binding the threads to cores in nodes of different sizes (from 4 to 16 CPUs per node). We showed that binding the threads to cores have a substantial impact on the performance, but also, that it is tightly related to the size of the node and its memory structure.
  
  We have integrated DLB and LeWI with the programming model OmpSs. This integration showed the potential of a collaboration between runtimes. It has allowed us to identify the key points of coordination and demonstrate that DLB is ready to be integrated with parallel runtimes. The performance evaluation showed the potential of this integration to improve the performance of applications. Moreover, it allows sharing resources between two different applications running on the same node without using MPI.
  
  We carried a deep evaluation of the framework and LeWI with a production code: Alya. We show that the execution time can be reduced a 40% in situations with a high load imbalance. But also, that the performance of the OpenMP parallelization has a substantial impact on the achieved performance. In situations with a good load balance, the use of DLB and LeWI does not affect the performance negatively. We proved that DLB and LeWI are ready to be used in production codes. We run strong scalability test of Alya using up to 16k cores of Marenostrum3 with LeWI. This experiments not only showed that LeWI scales up to thousands of cores. But also that applying the load balancing within the computational node can speed up executions in thousands of nodes.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: