2023
DOI: 10.1021/acs.jpcc.3c03106
|View full text |Cite
|
Sign up to set email alerts
|

ChemNLP: A Natural Language-Processing-Based Library for Materials Chemistry Text Data

Kamal Choudhary,
Mathew L. Kelley

Abstract: In this work, we present the ChemNLP library that can be used for ( 1) curating open access datasets for materials and chemistry literature, developing and comparing traditional machine learning, transformers and graph neural network models for (2) classifying and clustering texts, (3) named entity recognition for largescale text-mining, (4) abstractive summarization for generating titles of articles from abstracts, (5) text generation for suggesting abstracts from titles, (6) integration with density function… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 18 publications
(7 citation statements)
references
References 55 publications
0
7
0
Order By: Relevance
“…So far, the majority of materials data extraction approaches focus on fully automatic data extraction. 2–7 Automation is clearly desirable, particularly when extracting very large databases. However, more automation tends to require more complexity in the software, sophistication in training schemes, and knowledge about the extracted property.…”
Section: Introductionmentioning
confidence: 99%
“…So far, the majority of materials data extraction approaches focus on fully automatic data extraction. 2–7 Automation is clearly desirable, particularly when extracting very large databases. However, more automation tends to require more complexity in the software, sophistication in training schemes, and knowledge about the extracted property.…”
Section: Introductionmentioning
confidence: 99%
“…Of late, applications of machine learning to materials science and engineering has emerged as a transformative tool for materials discovery [1,2,3]. These applications typically range from deep learning [4,5], for image-based tasks [6,7,8], to more recent natural language processing adaptations to aid in design of new materials [9,10,11]. Although machine learning approaches to materials discovery have found applications across multiple materials domains, it has yet to make an impact in the field of dental materials [12,13].…”
Section: Introductionmentioning
confidence: 99%
“…50,51 Such models have often been applied for bulk property predictions and their applicability for defects and interfaces remains an open question. Several machine learning tools available in JARVIS such as classical force-field inspired descriptors (CFID), 43 atomistic line graph neural network (ALIGNN), 52,53 computer vision for atomistic images (AtomVision) 54 and natural language processing for chemistry (ChemNLP) 55,56 can be used in this regards to accelerate the interface design tasks. In particular, ALIGNN has been used to develop several fast surrogate models for property predictions as well as a unified force-field for fast structure optimizations.…”
Section: Introductionmentioning
confidence: 99%