2022
DOI: 10.1021/acs.jcim.1c01467
|View full text |Cite
|
Sign up to set email alerts
|

Unified Deep Learning Model for Multitask Reaction Predictions with Explanation

Abstract: There is significant interest and importance to develop robust machine learning models to assist organic chemistry synthesis. Typically, task-specific machine learning models for distinct reaction prediction tasks have been developed. In this work, we develop a unified deep learning model, T5Chem, for a variety of chemical reaction predictions tasks by adapting the “Text-to-Text Transfer Transformer” (T5) framework in natural language processing (NLP). On the basis of self-supervised pretraining with PubChem m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
75
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(75 citation statements)
references
References 76 publications
0
75
0
Order By: Relevance
“…Our work builds upon the tiered design and implementation of TDC and introduces extra datasets for virtual screening [21] and chemical reactions. It features a series of latest USPTO datasets [10,11,18,20,45] to support multi-class prediction of reaction type, template, catalyst and yield, in contrast to the binary classification tasks in MoleculeNet and TDC. Moreover, ImDrug carries out a systematic and focused study of deep imbalanced learning in AIDD with comprehensive settings and newly proposed metrics.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Our work builds upon the tiered design and implementation of TDC and introduces extra datasets for virtual screening [21] and chemical reactions. It features a series of latest USPTO datasets [10,11,18,20,45] to support multi-class prediction of reaction type, template, catalyst and yield, in contrast to the binary classification tasks in MoleculeNet and TDC. Moreover, ImDrug carries out a systematic and focused study of deep imbalanced learning in AIDD with comprehensive settings and newly proposed metrics.…”
Section: Related Workmentioning
confidence: 99%
“…Despite the success of deep learning in AIDD, by examining a myriad of medicinal chemistry databases and benchmarks [6,10,11,[18][19][20][21], we observe that these curated data repositories ubiquitously exhibit imbalanced distributions regardless of the specific tasks and domains 2 . This observation is reminiscent of the power-law scaling in networks [22] and the Pareto principle [23], which poses significant challenges for developing unbiased and generalizable AI algorithms [24].…”
Section: Introductionmentioning
confidence: 99%
“…. For this task, we use the data from (Lu & Zhang, 2022) that is already processed to be in a Seq2Seq format compatible with T5. There are 40K training examples.…”
Section: B Fine-tuning Detailsmentioning
confidence: 99%
“…[1][2][3][4][5][6] Similarly, developments in machine learning (ML) have enabled the distillation of large and complex data sets into predictive models capable of generalizing patterns in the data. 4,[7][8][9][10][11][12][13] Despite these advances, efforts to merge HTE with ML remains largely limited to a few reported datasets with limited structural diversity [14][15][16][17][18][19][20] and corresponding trained models that do not extrapolate well to substrates beyond the training set.…”
Section: Introductionmentioning
confidence: 99%
“…These categories have distinct strengths and weaknesses and quite divergent outcomes. Pd-catalyzed C-N coupling data have been extracted from historical reaction sets such as patent databases and Reaxys 8,9,[23][24][25][26] as well as Electronic Laboratory Notebooks (ELNs) 27 . Yield prediction for historical datasets (Strategy I) results in models with relatively poor performance (as evidenced by low coefficient of determination (R 2 ~ 0.2)) in part due to significant heterogeneity in the data quality which can be time consuming and often impossible to curate systematically.…”
Section: Introductionmentioning
confidence: 99%