Ordonnancement Efficace de Workflows Scientifiques en exploitant les Mé tadonné es Chaudes dans un Cloud Multisite
RĂ© sumĂ©Les applications scientifiques Ă grande Ă© chelle sont souvent exprimĂ© es sous forme de workflows scientifiques (SWfs) qui aident Ă dĂ© finir les jobs de traitement des donnĂ© es et les dĂ© pendances entre les activitĂ© s des jobs. Certains SWfs nĂ© cessitent une trĂš s grande quantitĂ© de stockage et de calcul, ce qui peut ĂȘ tre obtenu en exploitant plusieurs data centers dans un cloud. Dans ce contexte, la gestion des mĂ© tadonnĂ© es et l'ordonnancement des tĂąches entre diffĂ©rents data centers deviennent critiques pour l'exĂ©cution efficace de SWf. Dans cet article, nous proposons une architecture et un modĂš le distribuĂ© s hybrides, en utilisant les mĂ© tadonnĂ© es chaudes (frĂ© quemment consultĂ©es) pour l'ordonnancement efficace de SWf dans un cloud multisite. Nous utilisons notre modĂš le dans un systĂš me de gestion de workflows scientifiques (SWfMS) pour valider et rĂ© gler son applicabilitĂ© Ă diffĂ© rents workflows scientifiques rĂ© els avec diffĂ© rents algorithmes d'ordonnancement. Nous montrons que la combinaison d'une gestion efficace des mĂ©tadonnĂ©es chaudes et des algorithmes d'ordonnancement amĂ©liore les performances du SWfMS. En Ă© vitant les opĂ© rations inutiles de mĂ© tadonnĂ© es froides, le temps d'exĂ© cution des jobs qui s'exĂ©cutent en parallĂšle est rĂ© duit jusqu'Ă 64,1% et celui de l'ensemble des workflows scientifiques jusqu'Ă 37,5%.
ABSTRACTLarge-scale scientific applications are often expressed as scientific workflows (SWfs) that help defining data processing jobs and dependencies between jobs' activities. Several SWfs have huge storage and computation requirements, and so they need to be processed in multiple (cloud-federated) datacenters. It has been shown that efficient metadata handling plays a key role in the performance of computing systems. However, most of this evidence concern only single-site, HPC systems to date. In addition, the efficient scheduling of tasks among different data centers is critical to the SWf execution. In this paper, we present a hybrid distributed model and architecture, using hot metadata (frequently accessed metadata) for efficient SWf scheduling in a multisite cloud. We couple our model with a scientific workflow management system (SWfMS) to validate and tune its applicability to different real-life scientific workflows with different scheduling algorithms. We show that the combination of efficient management of hot metadata and scheduling algorithms improves the performance of SWfMS, reducing the execution time of highly parallel jobs up to 64.1% and that of the whole scientific workflows up to 37.5%, by avoiding unnecessary cold metadata operations.
KEYWORDShot metadata, metadata management, multisite clouds, scientific workflows, geo-distributed applications.