2020
DOI: 10.26434/chemrxiv.12395120.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unassisted Noise-Reduction of Chemical Reactions Data Sets

Abstract: <div><div><div><p>Existing deep learning models applied to reaction prediction in organic chemistry are able to reach extremely high levels of accuracy (> 90% for NLP- based ones1). With no chemical knowledge embedded than the information learnt from reaction data, the quality of the data sets plays a crucial role in the performance of the prediction models. While human curation is prohibitively expensive, the need for unaided approaches to remove chemically incorrect entries from ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…The current machine learning revolution in automated synthesis can significantly accelerate novel materials and molecules' development. In the last years, natural language processing methods emerged as robust and effective approaches in the field of organic chemistry, showing promising results in reaction prediction (1; 2; 3; 4), retrosynthesis planning (5; 6; 7; 8), data curation (9) and synthesis action generation (10; 11). In those studies the encoder-decoder transformer models introduced by Vaswani et al (12) excel among all other neural network architectures.…”
Section: Introductionmentioning
confidence: 99%
“…The current machine learning revolution in automated synthesis can significantly accelerate novel materials and molecules' development. In the last years, natural language processing methods emerged as robust and effective approaches in the field of organic chemistry, showing promising results in reaction prediction (1; 2; 3; 4), retrosynthesis planning (5; 6; 7; 8), data curation (9) and synthesis action generation (10; 11). In those studies the encoder-decoder transformer models introduced by Vaswani et al (12) excel among all other neural network architectures.…”
Section: Introductionmentioning
confidence: 99%
“…Exploring machine learning methods for improving or even building new innovative ways of data curation is an opportunity for future research in chemistry and materials science. Such methods for automatic data curation have recently received attention in many disciplines, including the chemical sciences. For instance, by coupling uncertainty estimation methods exploiting the statistical nature of machine learning methods, one can identify mistakes and anomalies in big data. For example, in cases where the machine learning model is confident in its predictions but large discrepancies are observed with the reported data, the user can be warned to double-check the entry to avoid mistakes in databases.…”
Section: Challenges and Opportunitiesmentioning
confidence: 99%
“…On the smoothed data sets, the performance of our models more than triples in the gram scale and doubles on the sub-gram scale, achieving R 2 scores of 0.277 and 0.388, respectively. The removal of noisy reactions [32] or reaction data augmentation techniques [33] could potentially lead to further improvements.…”
Section: Gram Versus Sub-gram Scalementioning
confidence: 99%