2021
DOI: 10.1109/access.2021.3083838
|View full text |Cite
|
Sign up to set email alerts
|

When SMILES Smiles, Practicality Judgment and Yield Prediction of Chemical Reaction via Deep Chemical Language Processing

Abstract: Simplified Molecular Input Line Entry System (SMILES) provides a text-based encoding method to describe the structure of chemical species and formulize general chemical reactions. Considering that chemical reactions have been represented in a language form, we present a symbol only model to generally predict the yield of organic synthesis reaction without considering complex quantum physical modeling or chemistry knowledge. Our model is the first deep neural network application that treats chemical reaction te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
25
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(25 citation statements)
references
References 26 publications
0
25
0
Order By: Relevance
“…Openly available data sets are derived from either patents [127,128] or chemical journals [129], and more rarely experimental procedures directly [130]. These data sets are distributed using SMILES as a representation for the reaction itself and usually include extra information in various formats.…”
Section: Reactionsmentioning
confidence: 99%
“…Openly available data sets are derived from either patents [127,128] or chemical journals [129], and more rarely experimental procedures directly [130]. These data sets are distributed using SMILES as a representation for the reaction itself and usually include extra information in various formats.…”
Section: Reactionsmentioning
confidence: 99%
“…With the advances of computing power, data availability, and algorithms, there has been significant interest in developing machine learning (ML) models to assist a variety of organic reaction-related tasks, 1−4 including reaction product prediction, 5−18 retrosynthesis, 9,14,17,19−37 reaction condition optimization, 38−41 reaction yield prediction, 42−54 and reaction type classification. 38,51,55−57 These ML-based data-driven approaches for organic synthesis can be classified into descriptor-based models, [5][6][7][8][9][10][19][20][21][22][23][24][25][26][38][39][40][41]51,55,57 graph-based m o d e l s , 1 1 − 1 3 , 2 7 , 2 8 , 5 2 a n d s e q u e n c e -b a s e d m o dels, [14][15][16][17][18][29][30][31][32][33][34][35][36][37]53,54,56 depending on how molecules are represented as input for machine learning. Descriptor-based models use hand-crafted features as molecular representations and often need feature engineering or template extraction for different reaction prediction tasks, which set limitations to generalizability.…”
Section: ■ Introductionmentioning
confidence: 99%
“…In fact, interesting applications of NLP-based methods to chemical reactions are now becoming available. 29–31 The use of NLP-based models for accurate prediction of various properties of molecules is well-known. 32,33 On the other hand, predicting the reaction outcome that is known to depend on the molecular attributes of catalysts, reactants, solvents and several other factors is challenging and has seldom been reported using language models.…”
Section: Introductionmentioning
confidence: 99%