Fast
and accurate crystal structure prediction (CSP) algorithms
and web servers are highly desirable for the exploration and discovery
of new materials out of the infinite chemical design space. However,
currently, the computationally expensive first-principles calculation-based
CSP algorithms are applicable to relatively small systems and are
out of reach of most materials researchers. Several teams have used
an element substitution approach for generating or predicting new
structures, but usually in an ad hoc way. Here we develop a template-based
crystal structure prediction (TCSP) algorithm and its companion web
server, which makes this tool accessible to all materials researchers.
Our algorithm uses elemental/chemical similarity and oxidation states
to guide the selection of template structures and then rank them based
on the substitution compatibility and can return multiple predictions
with ranking scores in a few minutes. A benchmark study on the 98290
formulas of the Materials Project database using leave-one-out evaluation
shows that our algorithm can achieve high accuracy (for 13145 target
structures, TCSP predicted their structures with root-mean-square
deviation < 0.1) for a large portion of the formulas. We have also
used TCSP to discover new materials of the Ga–B–N system,
showing its potential for high-throughput materials discovery. Our
user-friendly web app TCSP can be accessed freely at on our MaterialsAtlas.org web app platform.
Pre-trained transformer language models on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or balanced electronegativity samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal language models can generate chemically valid materials compositions with as high as 97.54\% to be charge neutral and 91.40\% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our language models also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using DFT calculations. All our trained materials transformer models and code can be accessed freely at \url{http://www.github.com/usccolumbia/MTransformer}.
One of the long-standing problems in materials science
is how to
predict a material’s structure and then its properties given
only its composition. Experimental characterization of crystal structures
has been widely used for structure determination, which is, however,
too expensive for high-throughput screening. At the same time, directly
predicting crystal structures from compositions remains a challenging
unsolved problem. Herein we propose a deep learning algorithm for
predicting the XRD spectrum given only the composition of a material,
which can then be used to infer key structural features for downstream
structural analysis such as crystal system or space group classification
or crystal lattice parameter determination or materials property prediction.
Benchmark studies on two data sets show that our DeepXRD algorithm
can achieve good performance for XRD prediction as evaluated over
our test sets. It can thus be used in high-throughput screening in
the huge materials composition space for materials discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.