Ayuda
Ir al contenido

Dialnet


Resumen de Acceleration of bioinformatics workflows on shared clusters

Ferran Badosa Galí

  • Shared clusters with multi-socket multi-core nodes have become common platforms to execute bioinformatics workflow applications. In order to harness the power offered by clusters, and meet in so far as possible time or cost conditions, RMS on clusters must properly allocate the applications to resources. To do so, they must account for the particularities of bioinformatics applications, which unlike most other applications typically executed on clusters, can yield different resource usages from one execution to the next one, depending on the values given to their configuration parameters and the characteristics of the biological dataset analyzed. Current RMS on clusters fail to account for these particularities, and allocate bioinformatics applications to shared cluster resources regardless of the actual needs of applications. As a result, cluster resources are wasted or low utilized, and the performance of applications drops. To tackle these issues, in this thesis we introduce a History-based Resource Manager (HBRM) for bioinformatics workflow applications in shared clusters. The goal of the HBRM is to determine the adequate combinations of resources that bioinformatics workflow applications need, in order to improve their performance in shared clusters obtained with current RMS. The HBRM is composed of different blocks: an Application characterization block, a Multivariate Regression Prediction, and a Scheduler multicriteria. First, we conduct a series of experiments in a cluster to characterize bioinformatics applications, monitor their performance and store it in a historical database. Second, we designed a Multivariate Regression predictor, which based on historical performance information of applications’ previous runs, estimates the resources applications submitted to the cluster need. Third, we developed a Scheduler Multicriteria, which based on performance predictions, and the makespan slowdown of multiprogrammed nodes, schedules bioinformatics applications queued on the cluster. In the Scheduler Multicriteria, we designed three different scheduling algorithms for bioinformatics applications: the SlowdownAware (SA) scheduling algorithm, the Biobackfill scheduling algorithm, and the SlowdownAware File-Placement (SAFP) algorithm. To validate the HBRM with all its scheduling algorithms, we conduct a series of experiments. We process a series of multiworkflows on different clusters, with resources being managed by the HBRM, and other state of the art scheduling algorithms. With the experiments, we prove that the HBRM can improve the makespan, resource utilization and efficiency of a queue of bioinformatics workflow applications obtained with the other state of the art algorithms. In the first validation experiments, we prove that the HBRM with the SA algorithm can improve the average workflow makespan by up to 34%, the average resource utilization by 86%, and the average resource efficiency by 96% obtained with SLURM’s FCFS on clusters. In the second validation experiments we prove that the HBRM with the Biobackfill algorithm can improve the average workflow makespan obtained with Bestfit and Firstfit algorithms by 8.55%. In the third validation experiments we prove that the HBRM with the SA algorithm can improve the makespans of Epigenomics, Montage, and Sipht multiworkflows obtained with Min-Min and Max-Min on large clusters by 43%. Also, we prove on large clusters and with the same workloads that the SAFP algorithm can improve the multiworkflow makespan of the SA algorithm by 88%, and the multiworkflow makespan of Min-Min and Max-Min by 93%.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus