2021
DOI: 10.1186/s13321-021-00535-x
|View full text |Cite
|
Sign up to set email alerts
|

Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier

Abstract: We present a sequence-to-sequence machine learning model for predicting the IUPAC name of a chemical from its standard International Chemical Identifier (InChI). The model uses two stacks of transformers in an encoder-decoder architecture, a setup similar to the neural networks used in state-of-the-art machine translation. Unlike neural machine translation, which usually tokenizes input and output into words or sub-words, our model processes the InChI and predicts the IUPAC name character by character. The mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 29 publications
0
10
0
Order By: Relevance
“…We have already pointed out that, leaving out the additional problem of extracting the chemical information from the images, a RNN only achieves a BLEU 4-gram score of 0.86 when translating from the SMILES to the IUPAC name (58). Nomenclature translation has been addressed with architectures based on the novel transformer networks (62), obtaining a practically perfect accuracy (63,64). Also, automatic recognition of molecular graphical depictions is able to correctly translate them to their SMILES representation with a 88% or 96% accuracy by using either a standard encoder-decoder (46) or a transformer (45) network.…”
Section: Discussionmentioning
confidence: 99%
“…We have already pointed out that, leaving out the additional problem of extracting the chemical information from the images, a RNN only achieves a BLEU 4-gram score of 0.86 when translating from the SMILES to the IUPAC name (58). Nomenclature translation has been addressed with architectures based on the novel transformer networks (62), obtaining a practically perfect accuracy (63,64). Also, automatic recognition of molecular graphical depictions is able to correctly translate them to their SMILES representation with a 88% or 96% accuracy by using either a standard encoder-decoder (46) or a transformer (45) network.…”
Section: Discussionmentioning
confidence: 99%
“…From the perspective of life science, the properties of molecules and the effects of drugs are mostly determined by their 3D structures [14, 15]. In most current MRL methods, one starts with representing molecules as 1D sequential strings, such as SMILES [16,17,18] and InChI [19,20,21], or 2D graphs [22,11,23,12]. This may limit their ability to incorporate 3D information for downstream tasks.…”
Section: Introductionmentioning
confidence: 99%
“…From the perspective of life science, the properties of molecules and the effects of drugs are mostly determined by their 3D structures [14, 15]. In most current MRL methods, one starts with representing molecules as 1D sequential strings, such as SMILES [16,17,18] and InChI [19,20,21], or 2D graphs [22,11,23,12,24]. This may limit their ability to incorporate 3D information for downstream tasks.…”
Section: Introductionmentioning
confidence: 99%