2020
DOI: 10.1186/s13321-020-00469-w
|View full text |Cite
|
Sign up to set email alerts
|

DECIMER: towards deep learning for chemical image recognition

Abstract: The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image o… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 66 publications
(50 citation statements)
references
References 20 publications
2
48
0
Order By: Relevance
“…For a discussion of the problems of string tokenization for deep learning, we refer our readers to those two publications. Our results confirm the superiority of SELFIES for the task discussed here and in our work on Optical Chemical Entity Recognition [16]. Thus, for this work all SMILES strings were converted into SELFIES using a custom python script (Fig.…”
Section: Datasupporting
confidence: 84%
See 2 more Smart Citations
“…For a discussion of the problems of string tokenization for deep learning, we refer our readers to those two publications. Our results confirm the superiority of SELFIES for the task discussed here and in our work on Optical Chemical Entity Recognition [16]. Thus, for this work all SMILES strings were converted into SELFIES using a custom python script (Fig.…”
Section: Datasupporting
confidence: 84%
“…Using the CDK, explicit hydrogens were removed from the molecules and their topological structures were converted to canonical SMILES strings. The obtained 111 million molecules were filtered according to the ruleset of our previous DECIMER work [ 16 ], i.e. molecules must have a molecular weight of fewer than 1500 Da, not possess any counter ions, contain only C, H, O, N, P, S, F, Cl, Br, I, Se and B, not contain any hydrogen isotopes (D, T), have between 3 and 40 bonds, not contain any charged group, contain implicit hydrogens only, except in functional groups, …”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The DECIMER (Deep lEarning for Chemical IMagE Recognition) project [ 18 ] is an end-to-end open-source system that can perform chemical structure segmentation on scanned scientific literature and use the segmented structure depictions to convert them into a computer-readable molecular file format.…”
Section: Introductionmentioning
confidence: 99%
“…With the DECIMER [ 9 ] project, we are currently working on the development of an open-source platform for the automated chemical structure extraction from printed literature. It aims at segmenting all chemical structure depictions from a given scanned document from the printed scientific literature and resolving their identity to yield a machine-readable presentation of the molecule.…”
Section: Introductionmentioning
confidence: 99%