2021
DOI: 10.26434/chemrxiv.14320907
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Img2Mol - Accurate SMILES Recognition from Molecular Graphical Depictions

Abstract: <p>Automatic recognition of the molecular content of a molecule’s graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining a deep convolutional neural network learning from molecule depictions a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(17 citation statements)
references
References 15 publications
0
17
0
Order By: Relevance
“…Numerical ranges are split into three subranges which are then treated like categories. For example, if the bond width could be described by an integer with the possible values [1,2,3,4,5,6], this would be allocated to three positions in the fingerprint. These positions would be linked to the subsets [1,2], [3,4] and [5,6].…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Numerical ranges are split into three subranges which are then treated like categories. For example, if the bond width could be described by an integer with the possible values [1,2,3,4,5,6], this would be allocated to three positions in the fingerprint. These positions would be linked to the subsets [1,2], [3,4] and [5,6].…”
Section: Methodsmentioning
confidence: 99%
“…For example, if the bond width could be described by an integer with the possible values [1,2,3,4,5,6], this would be allocated to three positions in the fingerprint. These positions would be linked to the subsets [1,2], [3,4] and [5,6]. This means that the fingerprint does not always define an exact value for certain parameters but only specifies a range.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…These are 512-dimensional descriptors learned in an unsupervised way using a recurrent autoencoder to translate between non-canonical SMILES and their canonical form. They have been shown to be very effective in QSAR modeling, as well as PCM modelling [21,12], inverse molecular problems such as optical chemical structure recognition [7] or reverse-engineering of molecular structures [14].…”
Section: Datamentioning
confidence: 99%