Summary Virtual Screening (VS) methods simulate molecular interactions in silico to look for the best chemical compound that interacts with a given molecular target. VS is becoming increasingly popular to accelerate the drug discovery process and constitute hard optimization problems with a huge computational cost. To deal with these two challenges, we have created METADOCK, an application that (1) enables a wide range of metaheuristics through a parametrized schema and (2) promotes the use of a multi‐GPU environment within a heterogeneous cluster. Metaheuristics provide approximate solutions in a reasonable time frame, but, given the stochastic nature of real‐life procedures, the energy budget goes hand in hand with acceleration to validate the proposed solution. This paper evaluates energy trade‐offs and correlations with performance for a set of metaheuristics derived from METADOCK. We establish a solid inference from minimal power to maximal performance in GPUs, and from there, to optimal energy consumption. This way, ideal heuristics can be chosen according not only to best accuracy and performance but also to energy requirements. Our study starts with a preselection of parameterized metaheuristic functions, building blocks where we will find optimal patterns from power criteria while preserving parallelism through a GPU execution. We then establish a methodology to figure out the best instances of the parameterized kernels based on energy patterns obtained, which are analyzed from different viewpoints, ie, performance, average power, and total energy consumed. We also compare the best workload distributions for optimal performance and power efficiency among Pascal and Maxwell GPUs on popular Titan models. Our experimental results demonstrate that the most power efficient GPU can be overloaded in order to reduce the total amount of energy required by as much as 20%, finding unique scenarios where Maxwell does it better in execution time, but with Pascal always ahead in performance per watt, reaching peaks of up to 40%.
BackgroundWe present a performance per watt analysis of CUDAlign 4.0, a parallel strategy to obtain the optimal pairwise alignment of huge DNA sequences in multi-GPU platforms using the exact Smith-Waterman method.ResultsOur study includes acceleration factors, performance, scalability, power efficiency and energy costs. We also quantify the influence of the contents of the compared sequences, identify potential scenarios for energy savings on speculative executions, and calculate performance and energy usage differences among distinct GPU generations and models. For a sequence alignment on chromosome-wide scale (around 2 Petacells), we are able to reduce execution times from 9.5 h on a Kepler GPU to just 2.5 h on a Pascal counterpart, with energy costs cut by 60%.ConclusionsWe find GPUs to be an order of magnitude ahead in performance per watt compared to Xeon Phis. Finally, versus typical low-power devices like FPGAs, GPUs keep similar GFLOPS/w ratios in 2017 on a five times faster execution.
Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context, Convolutional Neural Network (CNN) models constitute a representative example of success on a wide set of complex applications, particularly on datasets where the target can be represented through a hierarchy of local features of increasing semantic complexity. In most of the real scenarios, the roadmap to improve results relies on CNN settings involving brute force computation, and researchers have lately proven Nvidia GPUs to be one of the best hardware counterparts for acceleration. Our work complements those findings with an energy study on critical parameters for the deployment of CNNs on flagship image and video applications, ie, object recognition and people identification by gait, respectively. We evaluate energy consumption on four different networks based on the two most popular ones (ResNet/AlexNet), ie, ResNet (167 layers), a 2D CNN (15 layers), a CaffeNet (25 layers), and aResNetIm (94 layers) using batch sizes of 64, 128, and 256, and then correlate those with speed-up and accuracy to determine optimal settings. Experimental results on a multi-GPU server endowed with twin Maxwell and twin Pascal Titan X GPUs demonstrate that energy correlates with performance and that Pascal may have up to 40% gains versus Maxwell. Larger batch sizes extend performance gains and energy savings, but we have to keep an eye on accuracy, which sometimes shows a preference for small batches. We expect this work to provide a preliminary guidance for a wide set of CNN and DL applications in modern HPC times, where the GFLOPS/w ratio constitutes the primary goal.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.