2017
DOI: 10.1101/227041
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The GCTx format and cmap{Py, R, M} packages: resources for the optimized storage and integrated traversal of dense matrices of data and annotations

Abstract: Motivation: Computational analysis of datasets generated by treating cells with pharmacological and genetic perturbagens has proven useful for the discovery of functional relationships. Facilitated by technological improvements, perturbational datasets have grown in recent years to include millions of experiments. While initial studies, such as our work on Connectivity Map, used gene expression readouts, recent studies from the NIH LINCS consortium have expanded to a more diverse set of molecular readouts, inc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 21 publications
0
8
0
Order By: Relevance
“…Data L1000 expression data L1000 expression data (processed at "Level 5") for compounds, shRNA, and CRISPR were downloaded from clue.io in the form of GCTx files [7]. The L1000 assay directly measures the expression of 978 landmark genes, and allows for computational estimation of about 12,000 more genes [23].…”
Section: Methodsmentioning
confidence: 99%
“…Data L1000 expression data L1000 expression data (processed at "Level 5") for compounds, shRNA, and CRISPR were downloaded from clue.io in the form of GCTx files [7]. The L1000 assay directly measures the expression of 978 landmark genes, and allows for computational estimation of about 12,000 more genes [23].…”
Section: Methodsmentioning
confidence: 99%
“…Replicate-collapsed differential expression signatures (Level5 dataset) of the measured (landmark) genes were used in our analysis pipeline. For accessing L1000 signatures, we used cmapPy Python library (Enache et al, 2018) . Phase I and Phase II data were merged, and signatures corresponding to the same conditions (treatment, cell line, time and concentration in case of compounds, or treatment, cell line and time in case of shRNA) were averaged using the MODZ method.…”
Section: Databases and Data Preprocessingmentioning
confidence: 99%
“…Using the cmappy [63] and statsmodels [64] (version 0.9.0) Python packages, we computed the Pearson correlation (ρ) among the transcripts per million (TPM) values of RNA sequence expression of TLR4, MD2 and the HSP70 family genes occurring in the heart (left ventricle) and in the aorta of human donors. The correlation matrix visualization was generated using Python Matplotlib (version 3.0.0) [65].…”
Section: Materials and Methodsmentioning
confidence: 99%