2020
DOI: 10.26434/chemrxiv.12185559
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Synthetically Accessible Virtual Inventory (SAVI)

Abstract: We have made available a database of over 1 billion compounds predicted to be easily synthesizable. They have been created by a set of transforms based on an adaptation and extension of the CHMTRN/PATRAN programming languages describing chemical synthesis expert knowledge, which originally stem from the LHASA project. The chemoinformatics toolkit CACTVS was used to apply a total of 53 transforms to about 150,000 readily available building blocks (enamine.net). Only single-step, two-reactant syntheses were calc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 63 publications
0
4
0
Order By: Relevance
“…Efforts to predict crossbinding possibilities of S-SLSF with ligands rather than with mutations have been explored here by combining computational strategies previously developed by others 20 . These included searches among similars, core replacement, fragment extensions [21][22][23][24] , convolutional neural networks (CNN) 25 , de novo manual generation of compounds with drug-like properties [26][27][28][29] , filtering for synthetic feasibility, ligand presence in catalogs 22,30,31 , purchasable building-blocks and available chemical synthesis paths.…”
Section: Introductionmentioning
confidence: 99%
“…Efforts to predict crossbinding possibilities of S-SLSF with ligands rather than with mutations have been explored here by combining computational strategies previously developed by others 20 . These included searches among similars, core replacement, fragment extensions [21][22][23][24] , convolutional neural networks (CNN) 25 , de novo manual generation of compounds with drug-like properties [26][27][28][29] , filtering for synthetic feasibility, ligand presence in catalogs 22,30,31 , purchasable building-blocks and available chemical synthesis paths.…”
Section: Introductionmentioning
confidence: 99%
“…Each SAVI product has been annotated with over 60 properties, including data about the BBs and proposed reaction (catalog numbers, reactants, general conditions, protection, predicted yield etc. ), identifiers/representations of both the BBs and the product, as well as "drug design" properties such as "Rule of Five" (RO5) 62 and "Rule of Three" 62,63 violations, PAINS (pan assay interference compounds) 64 filter matches, FSP3 (fraction of sp 3 hybridized carbons), and log P. The complete list is available on the SAVI Download web page 54 as well as in sections 1 and 2 of Supplementary Information 1. Section 3 of Supplementary Information 1 shows the fields written in SD file format of a SAVI product file.…”
Section: Background and Summarymentioning
confidence: 99%
“…A total of 3.59 billion reactant pairs were created (Online-only Table 1) and then subjected to the reaction logic of the 53 productive transforms. This yielded 1,748,464,003 reactions saved (Table 3) 54 . Thus, the loss rate caused by encountering KILL statements was about 51%.…”
Section: Data Recordsmentioning
confidence: 99%
“…Diverse REAL drug-like subset of ENA 15 547 091 EDB DrugBank plus Enamine Hit Locator Library 2018 [22] 310 782 EMO eMolecules [23] 25 946 988 ENA Enamine REAL Database [24,25] 211 723 723 FFI CureFFI FDA-approved drugs and CNS drugs [11] 1497 G13 GDB-13 small organic molecules up to 13 atoms [26,27] 977 468 301 G17 GDB-17-Set up to 17 atom extension of GDB-13 [28,29] 50 000 000 HOP † Harvard Organic Photovoltaic Dataset [16,17] 350 LIT COVID-relevant small mols extracted from literature [13] 803 MOS Molecular Sets (MOSES) [30,31] 1 936 962 MCU MCULE compound database 45 472 755 PCH PubChem [32,33] 97 545 266 QM9 QM9 subset of GDB-17 [14,15] 133 885 REP Repurposing-related drug/tool compounds [34,35] 10 141 SAV Synthetically Accessible Virtual Inventory (SAVI) [36,37] 265 047 097 SUR SureChEMBL dataset of molecules from patents [38,39] 17 915 384 ZIN ZINC15 [40,41] 225 804 829 Total 4 934 042…”
Section: Key Namementioning
confidence: 99%