2019
DOI: 10.26434/chemrxiv.9897692.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Datasets and Their Influence on the Development of Computer Assisted Synthesis Planning Tools in the Pharmaceutical Domain

Abstract: <p>Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicina… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 38 publications
0
3
0
Order By: Relevance
“…Of notable mention is that, comparing various data sources, including patents (USPTO and Pistachio), literature and patents (Reaxys), and industrial data (AstraZeneca ELN), despite similarities in their size of template sets, they differ in the coverage of reaction space. Reaxys stands out for its extensive and uniquely diverse collection of reaction templates, providing a broader reaction space [60].…”
Section: Data Sourcesmentioning
confidence: 99%
“…Of notable mention is that, comparing various data sources, including patents (USPTO and Pistachio), literature and patents (Reaxys), and industrial data (AstraZeneca ELN), despite similarities in their size of template sets, they differ in the coverage of reaction space. Reaxys stands out for its extensive and uniquely diverse collection of reaction templates, providing a broader reaction space [60].…”
Section: Data Sourcesmentioning
confidence: 99%
“…Several methods have been proposed to predict synthetic pathways using machine learning. 49,50 We used AIZynthFinder to determine whether the optimized molecule has a practical synthetic pathway.…”
Section: ■ Experimental Sectionmentioning
confidence: 99%
“…1,[6][7][8][9][10] Despite efforts in building models that effectively learn chemistry from data, the quality of the data sets remains the primary limitation on performance improvements. The impact of data sets sizes and variability on the performance of computer-assisted synthesis planning tools has been recently investigated by Thakkar et al 11 . Nevertheless, the influence of having chemically wrong examples in training data sets remains a topic of little research regardless of its relevance and impact in all data-driven chemical applications.…”
Section: Introductionmentioning
confidence: 99%