Since the 1920s, packing arguments have been used to rationalize crystal structures in systems ranging from atomic mixtures to colloidal crystals. Packing arguments have recently been applied to complex nanoparticle structures, where they often, but not always, work. We examine when, if ever, packing is a causal mechanism in hard particle approximations of colloidal crystals. We investigate three crystal structures composed of their ideal packing shapes. We show that, contrary to expectations, the ordering mechanism cannot be packing, even when the thermodynamically self-assembled structure is the same as that of the densest packing. We also show that the best particle shapes for hard particle colloidal crystals at any finite pressure are imperfect versions of the ideal packing shape.
Data analyses based on linear methods constitute the simplest, most robust, and transparent approaches to the automatic processing of large amounts of data for building supervised or unsupervised machine learning models. Principal covariates regression (PCovR) is an underappreciated method that interpolates between principal component analysis and linear regression and can be used conveniently to reveal structure-property relations in terms of simple-to-interpret, low-dimensional maps. Here we provide a pedagogic overview of these data analysis schemes, including the use of the kernel trick to introduce an element of non-linearity while maintaining most of the convenience and the simplicity of linear approaches. We then introduce a kernelized version of PCovR and a sparsified extension, and demonstrate the performance of this approach in revealing and predicting structure-property relations in chemistry and materials science, showing a variety of examples including elemental carbon, porous silicate frameworks, organic molecules, amino acid conformers, and molecular materials.
Many butterflies, birds, beetles, and chameleons owe their spectacular colors to the microscopic patterns within their wings, feathers, or skin. When these patterns, or photonic crystals, result in the omnidirectional reflection of commensurate wavelengths of light, it is due to a complete photonic band gap (PBG). The number of natural crystal structures known to have a PBG is relatively small, and those within the even smaller subset of notoriety, including diamond and inverse opal, have proven difficult to synthesize. Here, we report more than 150,000 photonic band calculations for thousands of natural crystal templates from which we predict 351 photonic crystal templates – including nearly 300 previously-unreported structures – that can potentially be realized for a multitude of applications and length scales, including several in the visible range via colloidal self-assembly. With this large variety of 3D photonic crystals, we also revisit and discuss oft-used primary design heuristics for PBG materials.
Selecting the most relevant features and samples out of a large set of candidates is a task that occurs very often in the context of automated data analysis, where it improves the computational performance and often the transferability of a model. Here we focus on two popular subselection schemes applied to this end: CUR decomposition, derived from a low-rank approximation of the feature matrix, and farthest point sampling (FPS), which relies on the iterative identification of the most diverse samples and discriminating features. We modify these unsupervised approaches, incorporating a supervised component following the same spirit as the principal covariates (PCov) regression method. We show how this results in selections that perform better in supervised tasks, demonstrating with models of increasing complexity, from ridge regression to kernel ridge regression and finally feed-forward neural networks. We also present adjustments to minimise the impact of any subselection when performing unsupervised tasks. We demonstrate the significant improvements associated with PCov-CUR and PCov-FPS selections for applications to chemistry and materials science, typically reducing by a factor of two the number of features and samples required to achieve a given level of regression accuracy.
The number of materials or molecules that can be created by combining different chemical elements in various proportions and spatial arrangements is enormous. Computational chemistry can be used to generate databases containing billions of potential structures (Ruddigkeit, Deursen, Blum, & Reymond, 2012), and predict some of the associated properties (Montavon et al., 2013; Ramakrishnan, Dral, Rupp, & Lilienfeld, 2014). Unfortunately, the very large number of structures makes exploring such database-to understand structureproperty relations or find the best structure for a given application-a daunting task. In recent years, multiple molecular representations (Bartók, Kondor, & Csányi, 2013; Behler & Parrinello, 2007; Willatt, Musil, & Ceriotti, 2019) have been developed to compute structural similarities between materials or molecules, incorporating physically-relevant information and symmetries. The features associated with these representations can be used for unsupervised machine learning applications, such as clustering or classification of the different structures, and high-throughput screening of database for specific properties (De, Musil, Ingram, Baldauf, & Ceriotti, 2017; Hautier, 2019; Maier, Stöwe, & Sieg, 2007). Unfortunately, the dimensionality of these features (as well as most of other descriptors used in chemical and materials informatics) is very high, which makes the resulting classifications, clustering or mapping very hard to visualize. Dimensionality reduction algorithms (
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.