2023
DOI: 10.1038/s41467-023-37570-1
|View full text |Cite
|
Sign up to set email alerts
|

Predicting compound activity from phenotypic profiles and chemical structures

Abstract: Predicting assay results for compounds virtually using chemical structures and phenotypic profiles has the potential to reduce the time and resources of screens for drug discovery. Here, we evaluate the relative strength of three high-throughput data sources—chemical structures, imaging (Cell Painting), and gene-expression profiles (L1000)—to predict compound bioactivity using a historical collection of 16,170 compounds tested in 270 assays for a total of 585,439 readouts. All three data modalities can predict… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
31
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 39 publications
(32 citation statements)
references
References 47 publications
1
31
0
Order By: Relevance
“…For many applications, a query sample's image-based profile can be matched to similar (or opposite) profiles in the reference database, yielding hypotheses about the sample of interest; this guilt-by-similarity strategy is the basis of matching query compounds to annotated compounds to discern a mechanism of action, or to identify potential regulators of a given gene's pathway by matching the gene's profile to candidate compounds 10 . A large structured reference database can also be used to train machine learning models to predict compounds' assay activity 6 to identify promising compounds to test physically 4,5 . Image data can also be used in representation learning, to teach deep learning models biologically useful embeddings 11,12 .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For many applications, a query sample's image-based profile can be matched to similar (or opposite) profiles in the reference database, yielding hypotheses about the sample of interest; this guilt-by-similarity strategy is the basis of matching query compounds to annotated compounds to discern a mechanism of action, or to identify potential regulators of a given gene's pathway by matching the gene's profile to candidate compounds 10 . A large structured reference database can also be used to train machine learning models to predict compounds' assay activity 6 to identify promising compounds to test physically 4,5 . Image data can also be used in representation learning, to teach deep learning models biologically useful embeddings 11,12 .…”
Section: Introductionmentioning
confidence: 99%
“…The pharmaceutical industry needs disruptive technologies to rapidly reduce the cost and failure rates for getting life-changing medicines to patients. The image-based microscopy profiling assay, Cell Painting 1 , has shown promise in several steps in the drug discovery pipeline 2 , including disease phenotype identification 3 , hit identification 4,5 assay activity prediction 6 , toxicity detection 7,8 and mechanism of action determination 9 , as well as in basic biological research such as functional genomics 10 . In this assay, eight cellular components are stained with six inexpensive dyes and imaged in five channels on a fluorescence microscope.…”
Section: Introductionmentioning
confidence: 99%
“…In some studies, late fusion models seemed to slightly outperform early stage fusion models. 41 However, in our case, late stage fusion models did not show any benefits and were not further investigated. Alternative ways of combining chemical and morphological profiles have also been explored in other studies 42,43 and could be tested in follow-up analysis.…”
Section: ■ Results and Discussionmentioning
confidence: 99%
“…The biological feature sets by themselves did not significantly outperform the chemical descriptors, but their combination, either merging the input features of the three different sets (early stage fusion) or averaging the predictions obtained by three individual models (late stage fusion), seemed beneficial, especially for increasing the predictivity on the external test set. In some studies, late fusion models seemed to slightly outperform early stage fusion models . However, in our case, late stage fusion models did not show any benefits and were not further investigated.…”
Section: Resultsmentioning
confidence: 99%
“…To leverage the advantages of both these approaches and overcome their respective limitations, we propose a computational pipeline in the spirit of active learning. Our approach builds upon previous work showing the positive impact of employing multimodal compound descriptors for bioactivity prediction. , In a first step, initial models are built based on structural information and images, respectively. Then predictions from the image-informed and chemistry-informed ligand-based models guide the selection of molecules to be tested in an actual physical assay.…”
Section: Introductionmentioning
confidence: 99%