Science Gateways provide portals for experiments execution, regardless of the users' computational background. Nowadays its construction and performance need enhancement in terms of resource provision and task scheduling. We present the Modular Distributed Architecture to support the Protein Structure Prediction (MDAPSP), a Service-Oriented Architecture for management and construction of Science Gateways, with resource provisioning on a heterogeneous environment. The Decision Maker, central module of MDAPSP, defines the best computational environment according to experiment parameters. The proof of concept for MDAPSP is presented in WorkflowSim, with two novel schedulers. Our results demonstrate good Quality of Service (QoS), capable of correctly distributing the workload, fair response times, providing load balance, and overall system improvement. The study case relies on PSP algorithms and the Galaxy framework, with monitoring experiments to show the bottlenecks and critical aspects.
K E Y W O R D Soptimization, protein structure prediction, science gateways, scientific workflow, service-oriented architecture, WorkflowSim 1 Softw: Pract Exper. 2020;50:899-924. wileyonlinelibrary.com/journal/spe © 2020 John Wiley & Sons, Ltd. 899 900 MARTINS DE OLIVEIRA et al.Another limitation of scientific gateways is the resource provision for its machines, that affect usability and performance. The problem with resource provisioning is a quest for the best configuration, requiring the smallest investment and minimal overheads. In general, inexperienced users cannot make good decisions on what machines to contract. Limited economic resources complicate the scenario.Cloud computing paradigm is among the principal resources available, as it has many advantages: scalability, pay-as-you-go features, on-demand provisioning, and management by third parties. Multiple providers infrastructure and decentralization of resources are expanding and are expected to create impact on diverse areas. 4 Resource providing and management are one of the challenging aspects of this expansion. 5 Despite its success, cloud is not the only option for provisioning. 6 Computer clusters are reliable infrastructures for conducting experiments, mainly High-Performance Computing (HPC) and concurrent algorithms. The advantages of cluster machines are local management, freedom of configuration, reliable storage, and fast network transfers. 3 For those reasons, we investigate the scientific gateway Galaxy, because it is an adaptable system, capable to operate over different infrastructures, including local workstations.Galaxy is a well-known framework for scientific experiments, with workflow management and reproducibility support. Its tools offer practical solutions and addition of new algorithms according to the evolution of the system. The focus is Biomedical science, protein prediction, gene sequencing, and other bioinformatics-related areas. Its advantages are simple interface, reliability, powerful tools, and an active community. 7 However, as any other...