Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space

Heinen, Stefan; Rudorff, Guido Falk von; Lilienfeld, O. Anatole von

doi:10.1063/5.0059742

Cited by 53 publications

(60 citation statements)

References 68 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…[22][23][24][25][26][27] Given the high demand for quantitative reaction outcome prediction, and the steadily increasing amount of published reaction data (Figure 1B), the development of quantitative models building on this data would be highly desirable for both academic and industrial applications. [2,[28][29][30]…”

Section: Introductionmentioning

confidence: 99%

Machine Learning for Chemical Reactivity: The Importance of Failed Experiments

et al. 2022

View full text Add to dashboard Cite

Assessing the outcomes of chemical reactions in a quantitative fashion has been a cornerstone across all synthetic disciplines. Classically approached through empirical optimization, data-driven modelling bears an enormous potential to streamline this process. However, such predictive models require significant quantities of high-quality data, the availability of which is limited: Main reasons for this include experimental errors and, importantly, human biases regarding experiment selection and result reporting. In a series of case studies, we investigate the impact of these biases for drawing general conclusions from chemical reaction data, revealing the utmost importance of "negative" examples. Eventually, case studies into data expansion approaches showcase directions to circumvent these limitationsand demonstrate perspectives towards a long-term data quality enhancement in chemistry.

show abstract

Section: Introductionmentioning

confidence: 99%

Machine Learning for Chemical Reactivity: The Importance of Failed Experiments

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Compared to the structural representations, the remarkable accuracy of the composition-based repre- 1-hot, may be ascribed to the significant influence of substituent type and site on the overall BODIPY excitation energies rather than three-dimensional structural information. Similar observation has been made in past works [117][118][119][120]. In the following, we use the best performing SLATM-KRR-QML model.…”

Section: B Quantum Machine Learning Modelsmentioning

confidence: 81%

“…SLATM and FCHL descriptors were generated using the QML package [116], while BoB and 1-hot vector using an in-house code. The 1-hot representation was shown to perform well when the dataset is combinatorially diverse [117][118][119][120]. The 1-hot representation is a 322-bit (7 × 46) vector, where the presence/absence of one of the 46 substituents at the 7 sites is denoted by 1/0.…”

Section: Machine Learningmentioning

confidence: 99%

Data-Driven Modeling of S0 -> S1 Excitation Energy in the BODIPY Chemical Space: High-Throughput Computation, Quantum Machine Learning, and Inverse Design

Gupta,

Chakraborty,

Ghosh

et al. 2021

Preprint

View full text Add to dashboard Cite

Derivatives of BODIPY are popular fluorophores due to their synthetic feasibility, structural rigidity, high quantum yield, and tunable spectroscopic properties. While the characteristic absorption maximum of BODIPY is at 2.5 eV, combinations of functional groups and substitution sites can shift the peak position by ±1 eV. Time-dependent long-range corrected hybrid density functional methods can model the lowest excitation energies offering a semi-quantitative precision of ±0.3 eV. Alas, the chemical space of BODIPYs stemming from combinatorial introduction of-even a few dozen-substituents is too large for brute-force high-throughput modeling. To navigate this vast space, we select 77,412 molecules and train a kernel-based quantum machine learning model providing < 2% hold-out error. Further reuse of the results presented here to navigate the entire BODIPY universe comprising over 253 giga (253×10 9 ) molecules is demonstrated by inverse-designing candidates with desired target excitation energies.

show abstract

“…However, NaviCatGA imposes no constraint on the form of the fitness function and any alternative defined by the user is possible. In general, any ML‐based models tailored for the prediction of catalytic properties constitute a powerful alternative [32,33] …”

Section: Methodsmentioning

confidence: 99%

“…In general, any MLbased models tailored for the prediction of catalytic properties constitute a powerful alternative. [32,33] In order to help users defining fitness functions and assemblers conveniently, a number of predefined wrapper functions are provided, built around RDKit [34] and pySCF. [35,36] Frequent descriptors, such as frontier molecular orbital energies or molecular volumes, are provided through wrappers from multiple molecular formats, including SMILES.…”

Section: Choosing a Fitness Functionmentioning

confidence: 99%

Genetic Optimization of Homogeneous Catalysts

2022

View full text Add to dashboard Cite

We present the NaviCatGA package, a versatile genetic algorithm capable of optimizing molecular catalyst structures using well‐suited fitness functions to achieve a set of targeted properties. The flexibility and generality of this tool are validated and demonstrated with two examples: i) Ligand optimization and exploration for Ni‐catalyzed aryl‐ether cleavage manipulating SMILES and using a fitness function derived from molecular volcano plots, ii) multi‐objective (i. e., activity/selectivity) optimization of bipyridine N,N‐dioxide Lewis basic organocatalysts for the asymmetric propargylation of benzaldehyde from 3D molecular fragments. We show that evolutionary optimization, enabled by NaviCatGA, is an efficient way of accelerating catalyst discovery through bypassing combinatorial scaling issues and incorporating compelling chemical constraints.

show abstract

Toward the design of chemical reactions: Machine learning barriers of competing mechanisms in reactant space

Cited by 53 publications

References 68 publications

Machine Learning for Chemical Reactivity: The Importance of Failed Experiments

Machine Learning for Chemical Reactivity: The Importance of Failed Experiments

Data-Driven Modeling of S0 -> S1 Excitation Energy in the BODIPY Chemical Space: High-Throughput Computation, Quantum Machine Learning, and Inverse Design

Genetic Optimization of Homogeneous Catalysts

Contact Info

Product

Resources

About