Summary
We present Hadoop‐based replica exchange (HaRE), a Hadoop‐based implementation of the replica exchange scheme developed primarily for replica exchange statistical temperature molecular dynamics, an example of a large‐scale, advanced sampling molecular dynamics simulation. By using Hadoop as a framework and the MapReduce model for driving replica exchange, an efficient task‐level parallelism is introduced to replica exchange statistical temperature molecular dynamics simulations. In order to demonstrate this, we investigate the performance of our application over various distributed cyberinfrastructures (DCI), including several high‐performance computing systems, our cyberinfrastructure for reconfigurable optical networks testbed, the global environment for network innovations testbed, and the CloudLab testbed. Scalability performance analysis is shown in terms of scale‐out and scale‐up over a single high‐performance computing cluster, EC2, and CloudLab and scale‐across with cyberinfrastructure for reconfigurable optical networks and global environment for network innovations. As a result, we demonstrate that HaRE is capable of efficient execution over both homogeneous and heterogeneous DCI of varying size and configuration. Contributing factors to performance are discussed in order to provide insight towards the effects of computing environment on the execution of HaRE. With these contributions, we propose that similar loosely coupled scientific applications can also take advantage of the scalable, task‐level parallelism Hadoop MapReduce provides over various DCI. Copyright © 2016 John Wiley & Sons, Ltd.