Job-guided scheduling strategies for multisite hpc infrastructures.

Francesc Guim Bernat

Ayuda

Job-guided scheduling strategies for multisite hpc infrastructures.

Autores: Francesc Guim Bernat
Directores de la Tesis: Julita Corbalán González (dir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2008
Idioma: inglés
Tribunal Calificador de la Tesis: Josep Casanovas Garcia (presid.), Xavier Martorell Bofill (secret.), Ramin Yahyapour (voc.), Christine Morin (voc.), Toni Cortés Rosselló (voc.)
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Simulación
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Arquitectura de ordenadores
Texto completo no disponible (Saber más ...)
Resumen
- From the early eighties HPC architectures have evolved from single processor machines to very sophisticated architectures such as multi-cluster systems composed of heterogeneous nodes, Commonly, access to these systems has been controlled by batch which schedule and manage the jobs that users submit. These scheduling systems are composed of two different scheduling element: the local scheduler that decides which jobs have to run when, and the local resource manager, that decides where the jobs have to run. This thesis targets local scenarios with characteristics that are representative of most HPC centers. As far as the architecture is concerned, we assume clustered architectures which can be used to model a wide variety of systems. We consider scheduling strategies that are space-sharing (the system is partitioned and jobs run in dedicated partitions) and run-to-completion (jobs are not pre-empted during their execution). Furthermore, we assume that jobs submitted to the systems are rigid jobs (the number of processors used is fixed at submission time). In the first part of the Thesis, we focus on improving the performance of local scheduler scenarios from two perspectives:(1) We analyze the use of job runtime predictors models by local schedulers rather than user runtime estimations, and (2) we propose new models, mechanisms, and policies that involve both the local scheduler and the local resource manager. When users want to submit their HPC applications, they have to provide the scheduler with a description of the job that they want to be executed, and a list of resource requirements. Usually, these requirements include the requested number of processors, the runtime estimation, and other resource requirements, such as the amount of disk required or the memory required by the application. The job runtime estimation is required by job backfilling policies, which are a set of policies that base their scheduling decisions on the estimated runtime of the job and the number of requested processors. In many situations the user does not have enough experience or knowledge to estimate how long the jobs will run. Since backfilling policies are used in most of the HPC systems, we will use them as our starting point, as far as job scheduling is concerned. In the first part of the Thesis we present a set of models which characterize the behaviour that prediction techniques have shown in HPC centers. We define a set of prediction models which have several properties inherent to the predictors estimations, and that impact on the performance of the system. The models are designed to evaluate scheduling policies which use predictions rather than user estimates. This study provides ways to evaluate the impact of prediction errors in the performance of prediction based scheduling policies. Moreover, our models are evaluated using the most representative backfilling policies, and the study demonstrates that these techniques could be used in real systems using the evaluated variants without causing any negative effects on the performance of the system. We also identify the types of prediction errors which must be avoided when using predictions in these systems. There is a huge research background to evaluating the performance of backfilling strategies. Usually, researches use simulation models which evaluate how a given scheduling policy behaves with different workloads inputs. A key point in studying the performance of these scheduling policies is the resource models used in these evaluations. All the papers that have been presented in this area are based on the use of one-dimensional resource models. Thus, they only model the amount of free resources which are free and used during the simulation: they do not model where the different jobs are allocated.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: