Cloud computing has emerged as a promising platform for large scale data intensive scientific research, i.e., processing tasks that use hundreds of hours of CPU time and petabytes of data storage. Despite being object of current research, efforts are mainly based on MapReduce in order to have processing performed in clouds. This article describes the BioNimbus project, which aims to define an architecture and to create a framework for easy and flexible integration and support for distributed execution of bioinformatics tools in a cloud environment, not only tied to the MapReduce paradigm. As a result, we leverage cloud elasticity, fault tolerance and, at the same time, significantly improve the storage capacity and execution time of bioinformatics tasks, mainly of large scale genome sequencing projects.
Task scheduling is difficult in federated cloud environments, since there are many cloud providers with distinct capabilities that should be addressed. In bioinformatics, many tools and databases requiring large resources for processing and storing enourmous amounts of data are provided by physically separate institutions. This article treats the problem of task scheduling in BioNimbus, a federated cloud infrastructure for bioinformatics applications. We propose a scheduling algorithm based on the Analytic Hierarchy Process (AHP) to perform an efficient distribution for finding the best resources to execute each required task. We developed experiments with real biological data executing on BioNimbus, formed by three cloud providers executing in Amazon EC2. The obtained results show that DynamicAHP makes a significant improvement in the makespan time of bioinformatics applications executing in BioNimbus, when compared to the Round Robin algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.