2021
DOI: 10.1186/s13321-021-00538-8
|View full text |Cite
|
Sign up to set email alerts
|

DECIMER 1.0: deep learning for chemical image recognition using transformers

Abstract: The amount of data available on chemical structures and their properties has increased steadily over the past decades. In particular, articles published before the mid-1990 are available only in printed or scanned form. The extraction and storage of data from those articles in a publicly accessible database are desirable, but doing this manually is a slow and error-prone process. In order to extract chemical structure depictions and convert them into a computer-readable format, Optical Chemical Structure Recog… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 47 publications
(34 citation statements)
references
References 31 publications
0
34
0
Order By: Relevance
“…The network shows a few article that use methods in two cited articles, e.g. Kohulan Rajan, 2021 and Henning Otto Brinkhaus, 2022 [ 23 , 27 ]. Of course, if we do not limit the network to a subset of articles (here, Journal of Cheminformatics articles with CiTO annotation), the network becomes more interesting, but also much more complex.…”
Section: Annotated Citation Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…The network shows a few article that use methods in two cited articles, e.g. Kohulan Rajan, 2021 and Henning Otto Brinkhaus, 2022 [ 23 , 27 ]. Of course, if we do not limit the network to a subset of articles (here, Journal of Cheminformatics articles with CiTO annotation), the network becomes more interesting, but also much more complex.…”
Section: Annotated Citation Networkmentioning
confidence: 99%
“…Research article DECIMER 1.0: deep learning for chemical image recognition using transformers [23] cito:agreesWith, cito:citesAsAuthority, cito:citesAsDataSource, cito:extends, cito:usesMethodIn 66% (*)…”
Section: Article Intentions %Citomentioning
confidence: 99%
“…A more complex alternative would be to extract the core and (in a generalised model) repeating unit structures, e.g., as SMILES strings. Current successes in similar applications are encouraging [ 80 ] but available training data would be a limiting factor, as the numbers of homologous structures detected in relevant datasets reported above and of published homologous series e.g., in specialised databases, appear too low for most machine learning tasks. However, defining core structures with chain attachment points and multiple repeating units structures may allow training data to be synthetically generated through recombination and enumeration to form diverse homologous series structures.…”
Section: Future Workmentioning
confidence: 99%
“…Another DL method proposed in 2017 that has recently gained popularity among the scientific community is transformers [ 34 ], which adopts the mechanism of self-attention to handle sequential data. They have been tested in a series of medical tasks, including cardiac abnormality diagnosis [ 35 ], food allergen identification [ 36 ], medical language understanding [ 37 ], and chemical image recognition [ 38 ].…”
Section: Introductionmentioning
confidence: 99%