2023
DOI: 10.26434/chemrxiv-2023-33j02
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

One chiral fingerprint to find them all

Markus Orsi,
Jean-Louis Reymond

Abstract: Background: Molecular fingerprints are indispensable tools in cheminformatics. However, stereochemistry is generally not considered, which is problematic for large molecules which are almost all chiral. Results: Herein we report MAP4C, a chiral version of our previously reported fingerprint MAP4, which lists MinHashes computed from character strings containing the SMILES of all pairs of circular substructures up to a diameter of four bonds and the shortest topological distance between their central atoms. MAP4… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 55 publications
0
7
0
Order By: Relevance
“…Starting from the DBAASP dataset of 9,548 peptide sequences annotated with antibacterial activity and 2,262 peptide sequences annotated with hemolysis effect, we had previously evaluated NB, RF, SVM and RNN models, and found the latter to perform best for predicting both activity and hemolysis from sequence data. 13,14 For additional reference, we trained an SVM on the fraction of helical residues and the hydrophobic moment, two properties commonly known to correlate with antimicrobial activity, as well as another SVM on MAP4C, a molecular fingerprint that can reliably encode large molecules such as natural products and peptides including their chirality, 34 a parameter which we considered important since our data listed sequences containing both L-and D-amino acids.…”
Section: Model Screeningmentioning
confidence: 99%
“…Starting from the DBAASP dataset of 9,548 peptide sequences annotated with antibacterial activity and 2,262 peptide sequences annotated with hemolysis effect, we had previously evaluated NB, RF, SVM and RNN models, and found the latter to perform best for predicting both activity and hemolysis from sequence data. 13,14 For additional reference, we trained an SVM on the fraction of helical residues and the hydrophobic moment, two properties commonly known to correlate with antimicrobial activity, as well as another SVM on MAP4C, a molecular fingerprint that can reliably encode large molecules such as natural products and peptides including their chirality, 34 a parameter which we considered important since our data listed sequences containing both L-and D-amino acids.…”
Section: Model Screeningmentioning
confidence: 99%
“…We modified our previously reported PDGA 44 by computing fitness functions either as the Jaccard distance (dJ) to the target molecule computed using the molecular fingerprint MAP4C, 49 saving all generated molecules at each generation as trajectory molecules, or as the City Block Distance (dCBD)…”
Section: Genetic Algorithmmentioning
confidence: 99%
“…In each case, we performed three PDGA runs of maximum 10,000 generations starting from 50 random sequences using the chiral fingerprint MAP4C, which encodes pairs of circular substructures with high precision including chirality. 48,49 PDGA identified the target molecule in less than 10,000 generation in at least one of the three runs for each of these six peptides, including the two 30-mer peptides 5 and 6, which required exploration of the full 1E+60 chemical space (Table 2). Since each generation only amounted to 35 new molecules, which were evaluated against the 15 best scoring molecules of the previous generation used as parents, the cumulative number of molecules generated in each trajectory only amounted to a few thousands, which is remarkably low considering the size of the explored chemical space.…”
Section: Ligand-based Virtual Screening By Genetic Algorithm Guided N...mentioning
confidence: 99%
See 1 more Smart Citation
“…Starting from the DBAASP dataset of 9548 peptide sequences annotated with antibacterial activity and 2262 peptide sequences annotated with hemolysis effect, we had previously evaluated NB, RF, SVM and RNN models, and found the latter to perform best for predicting both activity and hemolysis from sequence data. 13,14 For additional reference, we trained an SVM on the fraction of helical residues and the hydrophobic moment, two properties commonly known to correlate with antimicrobial activity, as well as another SVM on MAP4C, a molecular fingerprint that can reliably encode large molecules such as natural products and peptides including their chirality, 34 a parameter which we considered important since our data listed sequences containing both Land D-amino acids.…”
Section: Model Screeningmentioning
confidence: 99%