2023
DOI: 10.26434/chemrxiv-2023-67hfc
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Best Practices for Using Genetic Algorithms in Molecular Discovery

Abstract: Genetic algorithms (GAs) are a powerful tool to search large chemical spaces for inverse molecular design. However, GAs have multiple hyperparameters that have not been thoroughly investigated for chemical space searches. In this work, we examine the general effects of a number of hyperparameters, such as population size, elitism rate, selection method, mutation rate, and convergence criteria, on key GA performance metrics. We show that using a self-termination method with a minimum Spearman's rank correlation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 38 publications
0
4
0
Order By: Relevance
“…It is possible that there are better candidates within the search space; however, they have not been found due to the stochastic nature of the GA. Recent work by our group has shown that tuning the GA hyperparameters, such as mutation rate, selection method, and convergence criteria, can affect the efficiency of searching the chemical space …”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is possible that there are better candidates within the search space; however, they have not been found due to the stochastic nature of the GA. Recent work by our group has shown that tuning the GA hyperparameters, such as mutation rate, selection method, and convergence criteria, can affect the efficiency of searching the chemical space …”
Section: Resultsmentioning
confidence: 99%
“…Recent work by our group has shown that tuning the GA hyperparameters, such as mutation rate, selection method, and convergence criteria, can affect the efficiency of searching the chemical space. 57 The massive speedups of using GAs to efficiently screen chemical space compared to brute-force high-throughput screening approaches can be seen in Table 1. The largest search space is ca.…”
Section: ■ Introductionmentioning
confidence: 99%
“…We have used the same hyperparameters for the GA from a previous study done in our group as they have shown to be effective for similar molecular systems. 15,16 The GA terminates the Selection-Crossover-Mutation cycle after 40 generations, which we have found to be sufficient in finding the minimal calculated HOMO-LUMO gap (Fig. 2).…”
Section: 31mentioning
confidence: 98%
“…There are ways to mitigate this behavior by tuning the GA's hyperparameters, such as the population size, mutation rate, and elitism rate. 16 However, even with well-tuned hyperparameters, there is still a chance that the GA misses the global optima. While there are other, deterministic algorithms that can find the global optima, they come with a greater computational cost.…”
Section: Some Remarksmentioning
confidence: 99%
“…Each GA was run for 20 generations with a population size of 100 and crossover probability, gene-wise mutation probability, and elitism rate of 0.9, 0.4, and 0.5, respectively. 135 The individuals in the initial population were built randomly. For each generation, the fitness values for the population were computed as follows: CIFs for each individual were generated using PORMAKE 0.2.0; 114 features were generated using molSimplify 1.7.3 123,125 and Zeo++ 0.3; 129 the solvent removal stability, 73 2-class water stability, and acid stability models were used to make predictions on the features; and the product of stable class probabilities from the three models was used as the fitness metric, as mentioned previously (eq 1 and SI Figure S18).…”
Section: Computational Detailsmentioning
confidence: 99%