2020
DOI: 10.26434/chemrxiv.13286741.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Data augmentation strategies to improve reaction yield predictions and estimate uncertainty

Abstract: Chemical reactions describe how precursor molecules react together and transform into products. The reaction yield describes the percentage of the precursors successfully transformed into products relative to the theoretical maximum. The prediction of reaction yields can help chemists navigate reaction space and accelerate the design of more effective routes. Here, we investigate the best-studied high-throughput experiment data set and show how data augmentation on chemical reactions can improve yield predicti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(19 citation statements)
references
References 25 publications
3
16
0
Order By: Relevance
“…4a–c ), while the augmented Yield-BERT model generally predicts yields that are too low for low-yield reactions and too high for high-yield reactions. 27 A similar tendency can be seen for the out-of-sample splits ( Fig. 4a–c ).…”
Section: Resultssupporting
confidence: 77%
See 1 more Smart Citation
“…4a–c ), while the augmented Yield-BERT model generally predicts yields that are too low for low-yield reactions and too high for high-yield reactions. 27 A similar tendency can be seen for the out-of-sample splits ( Fig. 4a–c ).…”
Section: Resultssupporting
confidence: 77%
“…Under a low data regime, the xgboost model trained on DRFP tends to overestimate low-yield reactions and underestimate high-yield reactions (Figure 4a-c), while the augmented Yield-BERT model generally predicts yields that are too low for low-yield reactions and too high for high-yield reactions. 29 A similar tendency can be seen for the out-of-sample splits (Figure 4a-c).…”
Section: Reaction Yield Predictionsupporting
confidence: 75%
“…Molecular Transformers have been applied to CASP, which can be cast as a sequence-to-sequence translation task, in which the string representations of the reactants are mapped to those of the corresponding product, or vice versa. Since their initial applications [151], Transformers have been employed to predict multistep syntheses [152], regio-and stereoselective reactions [153], enzymatic reaction outcomes [154], and reaction yields and classes [61,155,156]. Recently, Transformers have been applied to molecular property prediction [157,158] and optimization [159].…”
Section: Chemical Language Modelsmentioning
confidence: 99%
“…Luckily, many strategies are designed for the poor performance in small dataset of deep learning methods. [12][13][14][15][16] One of the effective methods is transfer learning, which transfer prior-knowledge learned from abundant data to another domain task with less data available in similar scenario. 10,11,[17][18][19] Reymond et al had performed transfer learning on carbohydrate reactions and showed better performance than a model trained on carbohydrate reactions only.…”
Section: Introductionmentioning
confidence: 99%