Abstract:Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on the behavior of different variants of clustering algorithms will be given.
Hybrid metal nanoparticles, consisting of a nano-crystalline metal core and a protecting shell of organic ligand molecules, have applications in diverse areas such as biolabeling, catalysis, nanomedicine, and solar energy. Despite a rapidly growing database of experimentally determined atom-precise nanoparticle structures and their properties, there has been no successful, systematic way to predict the atomistic structure of the metal-ligand interface. Here, we devise and validate a general method to predict the structure of the metal-ligand interface of ligand-stabilized gold and silver nanoparticles, based on information about local chemical environments of atoms in experimental data. In addition to predicting realistic interface structures, our method is useful for investigations on the steric effects at the metal-ligand interface, as well as for predicting isomers and intermediate structures induced by thermal dynamics or interactions with the environment. Our method is applicable to other hybrid nanomaterials once a suitable set of reference structures is available.
We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiolate (SR) protected gold nanoclusters. The ML potential is trained for Au 38 (SR) 24 by using previously published, density functional theory (DFT)-based, molecular dynamics (MD) simulation data on two experimentally characterised structural isomers of the cluster, and validated against independent DFT MD simulations. This method opens a door to efficient probing of the configuration space for further investigations of thermal-dependent electronic and optical properties of Au 38 (SR) 24. Our ML implementation strategy allows for generalisation and accuracy control of distance-based ML models for complex nanostructures having several chemical elements and interactions of varying strength. ligand such as halide or thiolate) ligands. The largest such known cluster was the phosphinehalide protected Au 39 , reported in 1992. 3 Considerable steps forward were taken when Brust and coworkers 4 reported a synthesis that produced all-thiolate protected gold clusters for an average size of two nanometers. Several new chemical compositions of both organo-soluble and water-soluble clusters were reported soon after, 5-8 culminating to the breakthroughs of the first crystal structure of a large Water-soluble all-thiol protected cluster Au 102 (pMBA) 44 (pMBA = para mercapto benzoic acid) by the Kornberg group in 2007 9 as well as the organo-soluble Au 25 (PET)-18 10-12 in 2008 and Au 38 (PET) 24 (PET = phenyl ethyl thiolate) 13,14 clusters in 2008-2010. Up to date, atomic structures of at least 150 different compounds are crystallographically known, which facilitates detailed theoretical computations and dynamical simulations of the properties of MPCs and greatly helps to correlate structures to measured properties in experimental data. Density functional theory (DFT) methods are the cornerstone for all computations that need to deal with details of the electronic structure, such as studies of optical absorption, optical excitation, fluorescence, and magnetism. However, while giving the most accurate and detailed information, DFT methods are also numerically the most demanding. DFT computations of some of the largest structurally known MPCs like the thiolate protected Ag 374 15,16 have to deal with up to 13 000 valence electrons, and even a single-point DFT energy calculation can take minutes and use hundreds or even thousands of CPU cores in a supercomputer. Force fields describing gold-thiolate MPCs have been developed to be used in molecular dynamics (MD) simulations , e.g., in the context of ReaxFF 17 and AMBER-GROMACS. 18 Effective but reliable methods to simulate the atomic dynamics of MPCs are needed, for instance, to study interactions of the clusters with the environment in the solvent phase, or with biomolecules and biological materials (viruses, proteins, lipid layers etc.). 19-21
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.
<div> <div> <div> <p>We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiolate (SR) protected gold nanoclusters. The ML potential is trained for Au38(SR)24 by using previously published, density functional theory (DFT) -based, molecular dynamics (MD) simulation data on two experimentally characterized structural isomers of the cluster, and validated against independent DFT MD simulations. This method opens a door to efficient probing of the configuration space for further investigations of thermal-dependent electronic and optical properties of Au38(SR)24. Our ML implementation strategy allows for generalization and accuracy control of distance-based ML models for complex nanostructures having several chemical elements and interactions of varying strength. </p> </div> </div> </div>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.