Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES

Nigam, AkshatKumar; Pollice, Robert; Krenn, Mario; Gomes, Gabriel; Aspuru‐Guzik, Alán

doi:10.1039/d1sc00231g

Cited by 94 publications

(88 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another line of inquiry could address high computational costs of DL and E3FP models. To this end, we suggest exploring alternative molecular representations and CPU-friendly generative models based on genetic algorithms, such as STONED on SELFIES [ 128 ]. Finally, we hope that in the future biomedical DL research will go beyond representation learning and will be used to derive novel biological knowledge by e.g., inferring synthetic and retrosynthetic chemical reactions, identifying novel disease-associated druggable proteins and clinically actionable biomarkers [ 129–131 ].…”

Section: Discussionmentioning

confidence: 99%

Comparative analysis of molecular fingerprints in prediction of drug combination effects

Zagidullin

Wang

Guan

et al. 2021

Briefings in Bioinformatics

View full text Add to dashboard Cite

Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.

show abstract

Section: Discussionmentioning

confidence: 99%

Comparative analysis of molecular fingerprints in prediction of drug combination effects

Zagidullin

Wang

Guan

et al. 2021

Briefings in Bioinformatics

View full text Add to dashboard Cite

show abstract

“…Recently, Nigam et al developed algorithms to explore the topological space of molecules with a docking-based scoring function for structure-based de novo drug design. They developed STONED 27 which performs molecule optimization by manipulating SELFIES, a sequential representation of molecular structures similar to SMILES but is guaranteed to be 100% valid. An advantage of STONED is that it does not require deep learning or expert knowledge to explore the chemical space.…”

Section: Introductionmentioning

confidence: 99%

Structure-based de novo drug design using 3D deep generative models

2021

View full text Add to dashboard Cite

show abstract

“…While their method is fast and inherently parallel, it requires an initial population of molecules and can generate invalid SMILES. Nigam et al [ 21 ] generate molecules by Gibbs sampling of SELFIES [ 28 ]. Their approach generates only valid molecules and does not require a training set.…”

Section: Introductionmentioning

confidence: 99%

Molecular generation by Fast Assembly of (Deep)SMILES fragments

Berenger

Tsuda

2021

J Cheminform

View full text Add to dashboard Cite

Background In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile. Results In this article, a simple method is described to generate only valid molecules at high frequency ($$>300,000$$ > 300 , 000 molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ($$>340,000$$ > 340 , 000 molecule/s) because it relies almost exclusively on string operations. The “Fast Assembly of SMILES Fragments” software is released as open-source at https://github.com/UnixJunkie/FASMIFRA. Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.

show abstract

Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES

Abstract: Interpolation and exploration within the chemical space for inverse design.

Cited by 94 publications

References 54 publications

Comparative analysis of molecular fingerprints in prediction of drug combination effects

Comparative analysis of molecular fingerprints in prediction of drug combination effects

Structure-based de novo drug design using 3D deep generative models

Molecular generation by Fast Assembly of (Deep)SMILES fragments

Contact Info

Product

Resources

About