We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms to minimize the number of operations and to improve numerical stability, (iv) using OpenMP to parallelize those sections of the code for which distributed-memory parallelization involves unfavorable computing/communication time ratio, and (v) careful memory management to minimize simultaneous access of distant memory sections. The new code enables us to run molecular dynamics simulations of protein systems with size exceeding 100,000 amino-acid residues, reaching over 1 ns/day (1 μs/day in all-atom timescale) with 24 cores for proteins of this size.
The Clairvoyant algorithm proposed in “A novel MPI reduction algorithm resilient to imbalances in process arrival times” was analyzed, commented and improved. The comments concern handling certain edge cases in the original pseudocode and description, i.e., adding another state of a process, improved cache friendliness more precise complexity estimations and some other issues improving the robustness of the algorithm implementation. The proposed improvements include skipping of idle loop rounds, simplifying generation of the ready set and management of the state array and an about 90-fold reduction in memory usage. Finally an extension enabling process arrival times (PATs) prediction was added: an additional background thread used to exchange the data with the PAT estimations. The performed tests, with a dedicated mini-benchmark executed in an HPC environment, showed correctness and improved performance of the solution, with comparison to the original or other state-of-the-art algorithms.
Summary
The UNRES package for coarse-grained simulations, which has recently been optimized to treat large protein systems, has been implemented on Graphical Processor Units. An over 100-time speed up of the GPU code (run on an NVIDIA A100) with respect to the sequential code and an 8.5 speed-up with respect to the parallel (OpenMP) code (run on 32 cores of 2 AMD EPYC 7313 CPUs) has been achieved for large proteins (with size over 10,000 residues). Due to the averaging over the fine-grain degrees of freedom, 1 time unit of UNRES simulations is equivalent to about 1,000 time units of laboratory time, therefore millisecond time scale of large protein systems can be reached with the UNRES-GPU code.
Availability and implementation
The complete source code of UNRES-GPU can be downloaded from https://projects.task.gda.pl/eurohpcpl-public/unres
Supplementary information
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.