2020
DOI: 10.1101/2020.12.15.422887
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Predicting compound activity from phenotypic profiles and chemical structures

Abstract: Recent advances in deep learning enable using chemical structures and phenotypic profiles to accurately predict assay results for compounds virtually, reducing the time and cost of screens in the drug discovery process. The relative strength of high-throughput data sources - chemical structures, images (Cell Painting), and gene expression profiles (L1000) - has been unknown. Here we compare their ability to predict the activity of compounds structurally different from those used in training, using a sparse dat… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
40
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

4
4

Authors

Journals

citations
Cited by 25 publications
(40 citation statements)
references
References 44 publications
0
40
0
Order By: Relevance
“…To tailor a perturbation atlas to perturbation modeling, it should take into account perturbations and conditions that capture possible cellular states relevant to our objectives: for example, varying response across cell types due to covariates, different targets and mechanisms of perturbations, perturbation interactions (Yofe et al, 2020), and biologically active chemical or sequence features (Becker et al, 2020). Of particular relevance for machine learning and DL, an atlas should capture enough experimental variation so that an ML model can generalize to many unseen situations, in addition to providing a collection of controlled experiments with many shared covariates.…”
Section: Toward a Perturbation Atlasmentioning
confidence: 99%
“…To tailor a perturbation atlas to perturbation modeling, it should take into account perturbations and conditions that capture possible cellular states relevant to our objectives: for example, varying response across cell types due to covariates, different targets and mechanisms of perturbations, perturbation interactions (Yofe et al, 2020), and biologically active chemical or sequence features (Becker et al, 2020). Of particular relevance for machine learning and DL, an atlas should capture enough experimental variation so that an ML model can generalize to many unseen situations, in addition to providing a collection of controlled experiments with many shared covariates.…”
Section: Toward a Perturbation Atlasmentioning
confidence: 99%
“…Scientists have used individual profiling modalities to advance a variety of drug discovery applications, including improving screening library diversity, predicting cytotoxicity, prioritizing compounds for follow-up study, and inferring the mechanism of action of chemicals [21][22][23][24][25][26][27][28] . Integrating gene expression and morphology profiles with chemical structures revealed that each data type provides complementary information for predicting a drug's mechanism of action 29,30 , for predicting the effects of perturbations 31 , and for identifying nuisance compounds that can lead to false hits 32 . As well, to some degree, gene expression and morphology datasets contain sufficient information to predict changes in each other 29,30 .…”
Section: Introductionmentioning
confidence: 99%
“…Data modality fusion and integration techniques are an active area of research in machine learning 4 and could potentially yield a superior representation of samples for many different biological profiling tasks on datasets where multiple profiling modalities are available. For example, predicting assay activity might be more successful using information about the impact of that compounds on cells' mRNA levels and morphology, rather than either data source alone 1 . Likewise, predicting the function of a gene based on similarities to other genes' profiles might be more successful using both data types.…”
Section: Modality-specific Complementary Subspacesmentioning
confidence: 99%
“…Such a dataset would enable multi-modal (also known as multi-omic) analyses and applications. Examples include integrating the two data sources to better predict a compound's activity in an assay 1 , predicting the mechanism of action of a drug based on its profile similarity to well-understood drugs 2 , or predicting a gene's function based on its profile similarity to well-understood genes 3 .…”
Section: Introductionmentioning
confidence: 99%