Evolution of Novartis’ Small Molecule Screening Deck Design

Schuffenhauer, Ansgar; Schneider, Nadine; Hintermann, Samuel; Auld, Douglas S.; Blank, Jutta; Cotesta, Simona; Engeloch, Caroline; Fechner, Nikolas; Gaul, Christoph; Giovannoni, Jerome; Jansen, Johanna M.; Joslin, John M.; Krastel, Philipp; Lounkine, Eugen; Manchester, John I.; Monovich, Lauren G.; Pelliccioli, Anna Paola; Schwarze, Manuel; Shultz, Michael D.; Stiefl, Nikolaus; Baeschlin, Daniel K.

doi:10.1021/acs.jmedchem.0c01332

Cited by 55 publications

(56 citation statements)

References 99 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, this approach is also relevant to experimental high-throughput screening, an expensive and important tool for challenging drug discovery problems. 51 Future work will seek to extend the open source MolPAL software package and leverage it in a prospective manner to greatly accelerate a structure-based virtual screen of the Enamine REAL database. We also hope to expand MolPAL beyond the initial software detailed in this report with the addition of new surrogate model architectures, the inclusion of improved uncertainty estimation techniques, and the expansion to other forms of virtual discovery, i.e., other objective functions.…”

Section: Discussionmentioning

confidence: 99%

Accelerating high-throughput virtual screening through molecular pool-based active learning

2021

View full text Add to dashboard Cite

show abstract

Section: Discussionmentioning

confidence: 99%

Accelerating high-throughput virtual screening through molecular pool-based active learning

2021

View full text Add to dashboard Cite

show abstract

“…However, the multi-class classification approach of scaffold network is not suitable for fold splitting, thus it was necessary to post-process the output. For practical purposes in medicinal chemistry, scaffolds with three rings often provide a useful level of granularity [26]. Therefore, from the scaffolds generated by the RDKit scaffold network implementation all scaffold with three rings were selected.…”

Section: Scaffold-based Binningmentioning

confidence: 99%

Splitting chemical structure data sets for federated privacy-preserving machine learning

Simm

Humbeck

Zalewski

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties in the federated learning. In this work we discuss three methods which provide a splitting of the data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria: bias in prediction performance, label and data imbalance, distance of the test set compounds to the training set and compare them to a random splitting.The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

show abstract

“…However, the multi-class classification approach of scaffold network is not suitable for fold splitting, thus it was necessary to post-process the output. For practical purposes in medicinal chemistry, scaffolds with three rings often provide a useful level of granularity [30]. Therefore, from the scaffolds generated by the RDKit scaffold network implementation all scaffold with three rings were selected.…”

Section: Scaffold-based Binningmentioning

confidence: 99%

Splitting chemical structure data sets for federated privacy-preserving machine learning

Simm

Humbeck

Zalewski

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant,but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties in the federated learning. In this work we discuss three methods which provide a splitting of the data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria: bias in prediction performance, label and data imbalance, distance of the test set compounds to the training set and compare them to a random splitting. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

show abstract

Evolution of Novartis’ Small Molecule Screening Deck Design

Cited by 55 publications

References 99 publications

Accelerating high-throughput virtual screening through molecular pool-based active learning

Accelerating high-throughput virtual screening through molecular pool-based active learning

Splitting chemical structure data sets for federated privacy-preserving machine learning

Splitting chemical structure data sets for federated privacy-preserving machine learning

Contact Info

Product

Resources

About