Auto-associative neural networks ("autoencoders") present a powerful nonlinear dimensionality reduction technique to mine data-driven collective variables from molecular simulation trajectories. This technique furnishes explicit and differentiable expressions for the nonlinear collective variables, making it ideally suited for integration with enhanced sampling techniques for accelerated exploration of configurational space. In this work, we describe a number of sophistications of the neural network architectures to improve and generalize the process of interleaved collective variable discovery and enhanced sampling. We employ circular network nodes to accommodate periodicities in the collective variables, hierarchical network architectures to rank-order the collective variables, and generalized encoder-decoder architectures to support bespoke error functions for network training to incorporate prior knowledge. We demonstrate our approach in blind collective variable discovery and enhanced sampling of the configurational free energy landscapes of alanine dipeptide and Trp-cage using an open-source plugin developed for the OpenMM molecular simulation package.
Neural network (NN) interatomic potentials provide fast prediction of potential energy surfaces, closely matching the accuracy of the electronic structure methods used to produce the training data. However, NN predictions are only reliable within well-learned training domains, and show volatile behavior when extrapolating. Uncertainty quantification methods can flag atomic configurations for which prediction confidence is low, but arriving at such uncertain regions requires expensive sampling of the NN phase space, often using atomistic simulations. Here, we exploit automatic differentiation to drive atomistic systems towards high-likelihood, high-uncertainty configurations without the need for molecular dynamics simulations. By performing adversarial attacks on an uncertainty metric, informative geometries that expand the training domain of NNs are sampled. When combined with an active learning loop, this approach bootstraps and improves NN potentials while decreasing the number of calls to the ground truth method. This efficiency is demonstrated on sampling of kinetic barriers, collective variables in molecules, and supramolecular chemistry in zeolite-molecule interactions, and can be extended to any NN potential architecture and materials system.
Rayleigh scattering attributed to the density fluctuation of silica glass is considered as the intrinsic origin of optical loss in glass fiber. Therefore, minimizing the density fluctuation is key to improving the information and telecommunications networks. In this study, classical molecular dynamics (MD) simulations were employed to theoretically examine the effectiveness of codoping boron and fluorine for ameliorating the homogeneity of silica glass. For the MD simulations, the force-matching potential (FMP) with a Buckingham formula was developed by optimizing the parameters to reproduce the force and energy calculated by the density functional theory (DFT). The accuracy of the FMP was confirmed via comparisons with available experimental data as well as glass models constructed using the neural network potential, which was superior in reproducing the force and energy of the DFT data to the FMP. As a result, the small amount of boron and fluorine added to the silica glass was found not to deteriorate the density fluctuation of silica glass. The additives reduce the viscosity of silica glass, which leads to a lower fictive temperature and, thus, to a better homogeneity. Consequently, the codoping of boron and fluorine was suggested as a possible solution to suppress the Rayleigh scattering of optical glass fiber.
High-throughput data generation methods and machine learning (ML) algorithms have given rise to a new era of computational materials science by learning the relations between composition, structure, and properties and by exploiting such relations for design. However, to build these connections, materials data must be translated into a numerical form, called a representation, that can be processed by an ML model. Data sets in materials science vary in format (ranging from images to spectra), size, and fidelity. Predictive models vary in scope and properties of interest. Here, we review context-dependent strategies for constructing representations that enable the use of materials as inputs or outputs for ML models. Furthermore, we discuss how modern ML techniques can learn representations from data and transfer chemical and physical information between tasks. Finally, we outline high-impact questions that have not been fully resolved and thus require further investigation. Expected final online publication date for the Annual Review of Materials Research, Volume 53 is July 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.