In recent years, we have witnessed a widespread application of machine learning techniques in the field of materials science, owing to the increased availability of research data and sophisticated algorithms. At the core of this technology lies the ability to encode material structures into descriptors that are understandable for a computer. Although significant advances have been made in this area, there is a continued need to explore efficient structure‐encoding strategies so as to maximize the predictive power of the machine learning models. Here we present a revision of the exciting progress in four representative structural features that are capable of describing the structures of diverse materials: structure graph, Coulomb matrix, topological descriptor, and diffraction fingerprint. Particular attention is given to the studies of crystalline solids, which appear more challenging to be encoded than molecules. By summarizing previous works and presenting critical appraisals of these descriptors, this review could offer some guideline for the selection of structural features and stimulate inspiration for the design of powerful descriptors suited towards different tasks. This article is categorized under: Structure and Mechanism > Computational Materials Science Data Science > Artificial Intelligence/Machine Learning
Due to the recent innovations in computer technology, the emerging field of materials informatics has now become a catalyst for a revolution of the research paradigm in materials science. Knowledge graphs, which provide support for knowledge management, are able to collectively capture the scientific knowledge from the vast collection of research articles and accomplish the automatic recognition of the relationships between entities. In this work, a materials knowledge graph, named MatKG, is constructed, which establishes a unique correspondence between subjects and objects in the materials science area. An emphasis is placed on the disambiguation of authors, addressed by a deduplication model based on machine learning and matching dependencies algorithms. Specifically, MatKG is applied to perform tracking on research trends in the study of LiFePO4 and to automatically chronicle the milestones achieved so far. It is believed that MatKG can serve as a versatile research platform for amalgamating and refining the scientific knowledge of materials in a variety of subfields and intersectional domains.
Overwhelming evidence has been accumulating that materials informatics can provide a novel solution for materials discovery. While the conventional approach to innovation relies mainly on experimentation, the generative models stemming from the field of machine learning can realize the long-held dream of inverse design, where properties are mapped to the chemical structures. In this review, we introduce the general aspects of inverse materials design and provide a brief overview of two generative models, variational autoencoder and generative adversarial network, which can be utilized to generate and optimize inorganic solid materials according to their properties. Reversible representation schemes for generative models are compared between molecular and crystalline structures, and challenges in regard to the latter are also discussed. Finally, we summarize the recent application of generative models in the exploration of chemical space with compositional and configurational degrees of freedom, and potential future directions are speculatively outlined.
The recent marriage of materials science and artificial intelligence has created the need to extract and collate materials information from the tremendous backlog of academic publications. However, this is notoriously hard to achieve in sophisticated application domains, such as Li-ion battery (LIB) cathodes, which require multiple variables for materials selection, making it challenging to automatically identify the critical terms in the text. Herein, a semantics representation framework, featuring a dual-attention module that refines word embeddings through multi-source information fusion, is proposed for literature mining of LIB cathodes. The word embeddings thus produced are biased toward domain-specific knowledge and can enable the detection of deep-seated associations among materials for targeted applications. Based on this framework, we establish a semantic knowledge graph dedicated to LIB cathodes, which allows us to unravel the latent materials relationships from scientific literature and even to discover candidate materials not yet exploited as cathodes before. This work provides a long-sought path to the realization of text-mining-based knowledge management for complicated materials systems with little dependence on domain expertise.
The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.