2020
DOI: 10.7717/peerj-pchem.11
|View full text |Cite
|
Sign up to set email alerts
|

Chemical space exploration: how genetic algorithms find the needle in the haystack

Abstract: We explain why search algorithms can find molecules with particular properties in an enormous chemical space (ca 1060 molecules) by considering only a tiny subset (typically 103−6 molecules). Using a very simple example, we show that the number of potential paths that the search algorithms can follow to the target is equally vast. Thus, the probability of randomly finding a molecule that is on one of these paths is quite high and from here a search algorithm can follow the path to the target molecule. A path i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
31
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(31 citation statements)
references
References 21 publications
0
31
0
Order By: Relevance
“…With such descriptors as starting points for emerging genetic algorithms ( Henault et al., 2020 ; Jensen, 2019 ) or reinforcement learning methods ( Gómez-Bombarelli et al., 2018 ), the discovery of novel redox-active organic compounds with desired performance characteristics can be automated. Examples of the use of DFT-driven screening and discovery based on machine learning include the identification of organic chromophores for photovoltaic applications ( Hachmann et al., 2011 ).…”
Section: Driving Materials Selection and Discovery – Role Of Quantum Chemistrymentioning
confidence: 99%
“…With such descriptors as starting points for emerging genetic algorithms ( Henault et al., 2020 ; Jensen, 2019 ) or reinforcement learning methods ( Gómez-Bombarelli et al., 2018 ), the discovery of novel redox-active organic compounds with desired performance characteristics can be automated. Examples of the use of DFT-driven screening and discovery based on machine learning include the identification of organic chromophores for photovoltaic applications ( Hachmann et al., 2011 ).…”
Section: Driving Materials Selection and Discovery – Role Of Quantum Chemistrymentioning
confidence: 99%
“…To study the difference in efficiency of GB-EPI and GB-GA, we make a statistical analysis of a representative rediscovery task (troglitazone). In line with earlier work 14,50 on the efficiency of GB-GA, we calculate the average number of tness function evaluations and CPU time needed for rediscovery, and the rediscovery success rate of both algorithms. As we learned from the median molecule task, starting from a randomised set of molecules elucidates the exploratory power of the algorithms more.…”
Section: Comparing Efficiency Of Gb-epi and Gb-gamentioning
confidence: 99%
“…Therefore, we start this rediscovery task with the 100 topscoring molecules from 10 000 molecules randomly chosen from a 1.6 million ChEMBL subset, as constructed by Henault et al 50 In this subset all molecules with a bit-vector Tanimoto similarity to the target above 0.323 are removed. 13 Table 2 shows the results for 100 runs of GB-EPI and GB-GA (with settings taken from Henault et al 50 ), both with a maximum of 1000 generations per run.…”
Section: Comparing Efficiency Of Gb-epi and Gb-gamentioning
confidence: 99%
“…Instead an automated search for interesting candidates is desired. Examples of such search methods include evolutionary algorithms [1,2], basin-hopping [3] and particle swarm optimization [4].…”
Section: Introductionmentioning
confidence: 99%