ML4Chem is an open-source machine learning library for chemistry and materials science. It provides an extendable platform to develop and deploy machine learning models and pipelines and is targeted to the non-expert and expert users. ML4Chem follows user-experience design and offers the needed tools to go from data preparation to inference. Here we introduce its atomistic module for the implementation, deployment, and reproducibility of atom-centered models. This module is composed of six core building blocks: data, featurization, models, model optimization, inference, and visualization. We present their functionality and easiness of use with demonstrations utilizing neural networks and kernel ridge regression algorithms. * melkhatibr@lbl.gov arXiv:2003.13388v1 [physics.chem-ph] 2 Mar 2020 2 external programs, and exportation of any of its modules' outputs. Also, ML4Chem is in its infancy bringing up the possibility to shape its future directions based on current users' needs and ML paradigms.Here we introduce the atomistic module where ML algorithms learn underlying relationships between molecules and properties treating atoms as central objects. They exploit the principle of locality in Physics: a global quantity is defined as a sum that runs over many localized contributions. These localized contributions usually account for interactions of an atom and its nearest-neighbor atoms (many-body interactions). Atomistic models are very useful and have been successfully applied for the acceleration of molecular dynamics simulations [17][18][19], identification of phase transitions in materials [20], determination of energy and atomic forces with high accuracy [21,22], the search of saddle-points[23] and the prediction of atomic charges [24,25].This publication is organized as follows: in section II, we will discuss the design and architecture of ML4Chem's atomistic module. Each of its core blocks is introduced in Section III and we will demonstrate the code's capabilities through a series of demonstration examples in Section IV. Finally, conclusions and perspectives are drawn. II. ATOMISTIC MODULE: DESIGN AND ARCHITECTURE ML4Chem and its modules are written in Python in an object-oriented programming paradigm and are built on top of popular open-source projects to avoid duplication of efforts. In this regard, all deep learning computations are implemented with Pytorch[3]. Mathematical and linear algebra operations are executed by Numpy [26] or Scipy [27,28] that are widely used and recognized for this purpose. Parallelism is achieved with a flexible library for parallel computing called Dask [29]. Dask enables computational scaling-up from a laptop to High-Performance Computing (HPC) clusters effortlessly and offers a web dashboard to real-time monitoring. This is particularly valuable because it provides a good estimation to users about the status of calculations, and helps at profiling computations. Good documentation is another important aspect, as the lack of it can harm usability. ML4Chem's source code is docum...
Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose conservative synchronization patterns leading to suboptimal performance. In this paper, we present context-sensitive dynamic optimizations that elide barriers redundant during the program execution. In our technique, we perform data race detection alongside the program to identify redundant barriers in their calling contexts; after an initial learning, we start eliding all future instances of barriers occurring in the same calling context. We present an automatic on-the-fly optimization and a multi-pass guided optimization. We apply our techniques to NWChem-a 6 million line computational chemistry code written in C/C++/Fortran that uses several runtime libraries such as Global Arrays, ComEx, DMAPP, and MPI. Our technique elides a surprisingly high fraction of barriers (as many as 63%) in production runs. This redundancy elimination translates to application speedups as high as 14% on 2048 cores. Our techniques also provided valuable insight about the application behavior, later used by NWChem developers. Overall, we demonstrate the value of holistic context-sensitive analyses that consider the domain science in conjunction with the associated runtime software stack.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.