Reaction Data Curation I: Chemical Structures and Transformations Standardization

Gimadiev, Timur; Lin, Arkadii; Afonina, Valentina A.; Batyrshin, Dinar; Nugmanov, Ramil; Akhmetshin, Tagir; Sidorov, Pavel; Duybankova, Natalia; Verhoeven, Jonas; Wegner, Jörg K.; Ceulemans, Hugo; Gedich, Andrey; Madzhidov, Timur; Varnek, Alexandre

doi:10.1002/minf.202100119

Cited by 27 publications

(38 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CGRs can be obtained for both balanced and imbalanced reactions, and imbalanced reactions can be balanced via decomposition of the CGR. 44 However, correct labels for missing atoms and bonds can only be recovered for some but not all reactions using CGR decomposition, namely, if no rearrangements occurs within the missing fragments. An automatic balancing via the CGR therefore potentially introduces noise to a data set, if some of the missing fragments are wrongly autocompleted.…”

Section: Methodsmentioning

confidence: 99%

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Heid

Green

2021

J. Chem. Inf. Model.

130

View full text Add to dashboard Cite

The estimation of chemical reaction properties such as activation energies, rates, or yields is a central topic of computational chemistry. In contrast to molecular properties, where machine learning approaches such as graph convolutional neural networks (GCNNs) have excelled for a wide variety of tasks, no general and transferable adaptations of GCNNs for reactions have been developed yet. We therefore combined a popular cheminformatics reaction representation, the so-called condensed graph of reaction (CGR), with a recent GCNN architecture to arrive at a versatile, robust, and compact deep learning model. The CGR is a superposition of the reactant and product graphs of a chemical reaction and thus an ideal input for graph-based machine learning approaches. The model learns to create a data-driven, task-dependent reaction embedding that does not rely on expert knowledge, similar to current molecular GCNNs. Our approach outperforms current state-of-the-art models in accuracy, is applicable even to imbalanced reactions, and possesses excellent predictive capabilities for diverse target properties, such as activation energies, reaction enthalpies, rate constants, yields, or reaction classes. We furthermore curated a large set of atom-mapped reactions along with their target properties, which can serve as benchmark data sets for future work. All data sets and the developed reaction GCNN model are available online, free of charge, and open source.

show abstract

Section: Methodsmentioning

confidence: 99%

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Heid

Green

2021

J. Chem. Inf. Model.

130

View full text Add to dashboard Cite

show abstract

“…The model is trained on the combined open-source reaction dataset USPTO 29 and commercial reaction dataset Pistachio 19 . The data normalization followed the process described in 30 and duplicated entries were removed.…”

Section: Data Setsmentioning

confidence: 99%

Bidirectional Graphormer for Reactivity Understanding: neural network trained to reaction atom-to-atom mapping task

Nugmanov

Dyubankova

Gedich³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

This work introduces GraphormerMapper – a new algorithm for reactions atom-to-atom mapping (AAM) based on a distance-aware BERT neural network. In benchmarking studies with IBM RxnMapper, the best AAM algorithm according to our previous study, we demonstrate that our AAM algorithm is superior on our “Golden” benchmarking dataset. The mapper is implemented in Chython [https://github.com/chython/chython] and Chytorch [https://github.com/chython/chytorch, https://github.com/chython/chytorch-rxnmap] Python packages which are freely available for out-the-box use. Chython is a cheminformatics library with a simple interface for processing reaction and molecular data. The key features of Chython are: chemical functional groups standardization, checking atom valence errors, substructure search, and advanced reaction manipulation, for example, generating products from reactants and reaction atom-to-atom mapping. Chytorch provides a PyTorch-like interface for graph-based neural networks developed specifically for chemical tasks.

show abstract

“…The initial dataset of one-step hydrogenation reactions containing 591,563 reactions (391,880 chemical transformations) was extracted from the Reaxys ® database in May 2019. We follow the same terminology as in our earlier publication [ 13 ]: by “transformation” we mean a set of reactants and products, “reaction” is a transformation carried out in the given conditions. Hydrogenation reactions were revealed by the presence of “H2” or “hydrogen” keyword in the reagent list for at least one condition corresponding to a reaction.…”

Section: Computational Proceduresmentioning

confidence: 99%

“…Hydrogenation reactions were revealed by the presence of “H2” or “hydrogen” keyword in the reagent list for at least one condition corresponding to a reaction. Chemical structures were standardized according to the protocol described by Gimadiev et al [ 13 ]. CGRtools [ 14 ] was used for functional group normalization, aromatization, removing explicit hydrogens and duplicate cleaning.…”

Section: Computational Proceduresmentioning

confidence: 99%

See 1 more Smart Citation

Prediction of Optimal Conditions of Hydrogenation Reaction Using the Likelihood Ranking Approach

Afonina

Mazitov

Nurmukhametova

et al. 2021

IJMS

Self Cite

View full text Add to dashboard Cite

The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.

show abstract

Reaction Data Curation I: Chemical Structures and Transformations Standardization

Cited by 27 publications

References 63 publications

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction

Bidirectional Graphormer for Reactivity Understanding: neural network trained to reaction atom-to-atom mapping task

Prediction of Optimal Conditions of Hydrogenation Reaction Using the Likelihood Ranking Approach

Contact Info

Product

Resources

About