The electrocatalytic CO2 reduction process has gained enormous attention for both environmental protection and chemicals production. Thereinto, the design of new electrocatalysts with high activity and selectivity can draw inspiration from the abundant scientific literature. An annotated and verified corpus made from massive literature can assist the development of natural language processing (NLP) models, which can offer insight to help guide the understanding of these underlying mechanisms. To facilitate data mining in this direction, we present a benchmark corpus of 6,086 records manually extracted from 835 electrocatalytic publications, along with an extended corpus with 145,179 records in this article. In this corpus, nine types of knowledge such as material, regulation method, product, faradaic efficiency, cell setup, electrolyte, synthesis method, current density, and voltage are provided by either annotating or extracting. Machine learning algorithms can be applied to the corpus to help scientists find new and effective electrocatalysts. Furthermore, researchers familiar with NLP can use this corpus to design domain-specific named entity recognition (NER) models.
Electrocatalysis takes a significant role in the production of sustainable fuels and chemicals. The combination of artificial intelligence and catalytic science is exhibiting great potential to extract, analyze, and predict electrocatalysts. However, the currently developed machine learning approach usually requires a mass of data from density functional theory calculations to train and optimize models. In contrast, a knowledge graph has the potential to extract useful information from a large amount of the literature without referring to density functional theory. Herein, a knowledge graph of Cu-based electrocatalysts for electrocatalytic CO 2 reduction is constructed based on a linguistically enriched SciBERT-based framework. This framework retrieves multiple types of entities including material, regulation method, product, Faradaic efficiency, etc. from 757 scientific literature, generates representations with abundant domain-specific semantic information, and exhibits the capability to deal with electrocatalysts for CO 2 reduction. The obtained graph shows the development history of related catalysts, builds relationships between the factors associated with catalysis, and provides intuitive charts for researchers to gain useful information. Furthermore, we propose a deep learning-based prediction model, which integrates the semantic information from the scientific literature (word embedding) with the correlation of knowledge triples (graph embedding) and realizes the prediction of the Faradaic efficiency for a targeted case. This work paves the way for catalyst design in the manner of merging artificial intelligence with catalytic science.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.