2020
DOI: 10.1101/2020.11.16.385427
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simplified and unified access to cancer proteogenomic data

Abstract: Comprehensive cancer datasets recently generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) offer great potential for advancing our understanding of how to combat cancer. These datasets include DNA, RNA, protein, and clinical characterization for tumor and normal samples from large cohorts in many different cancer types. The raw data are publicly available at various Cancer Research Data Commons. However, widespread re-use of these datasets is also facilitated by easy access to the processed q… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 23 publications
(33 reference statements)
0
8
0
Order By: Relevance
“…CPTAC RNA-seq and mass spectrometry datasets for breast (Krug et al, 2020), ovarian (Hu et al, 2020b;Zhang et al, 2016), colorectal (Vasaikar et al, 2019;Zhang et al, 2014), lung adenocarcinoma (Gillette et al, 2020), and endometrial (Dou et al, 2020) cancer discovery studies were retrieved in accordance with the CPTAC data use and embargo policies using the cptac v.0.9.1 package in Python 3.9. Statistical learning was performed using scikit-learn 0.24.2 (Lindgren et al, 2021). Transcriptomics data were standardized, after which data were split 80/20 into train and test sets.…”
Section: Retrieval and Analysis Of Public Expression Data Setsmentioning
confidence: 99%
“…CPTAC RNA-seq and mass spectrometry datasets for breast (Krug et al, 2020), ovarian (Hu et al, 2020b;Zhang et al, 2016), colorectal (Vasaikar et al, 2019;Zhang et al, 2014), lung adenocarcinoma (Gillette et al, 2020), and endometrial (Dou et al, 2020) cancer discovery studies were retrieved in accordance with the CPTAC data use and embargo policies using the cptac v.0.9.1 package in Python 3.9. Statistical learning was performed using scikit-learn 0.24.2 (Lindgren et al, 2021). Transcriptomics data were standardized, after which data were split 80/20 into train and test sets.…”
Section: Retrieval and Analysis Of Public Expression Data Setsmentioning
confidence: 99%
“…We performed the analysis using expression data from TCGA/CPTAC RNA sequencing experiments through the cptac Python API, which retrieves the final data tables from the flagship CPTAC papers of each individual cancer type 25 . Although each TCGA/CPTAC cancer subtype project follows an overall consistent experimental design and data acquisition strategy, minute differences exist in the processing pipelines usedM to analyze the RNA sequencing data (e.g., STAR vs. Bowtie2) and gene expression measure (e.g., RPKM vs. FPKM) which could bias gene expression values across cancer types.…”
Section: Discussionmentioning
confidence: 99%
“…The cumulative inclusions of each cancer type in the order above are sequentially referred to as CPTAC_2 to CPTAC_8 in the manuscript, such that CPTAC_2 refers to the union of ovarian and breast cancer (OV + BR); CPTAC_3 refers the union of ovarian, breast, and endometrial cancer (OV + BR + EN); and so on. The mRNA and protein level expression data from the CPTAC cancer types was retrieved using the cptac package v.0.9.7 25 in Python 3.9. Each column of the quantitative measurement of the transcriptomics data acted as an independent variable or feature variable whereas the normalized quantitative measurement of a particular protein of interest acted as the single dependent or target variable in the protein model.…”
Section: Methodsmentioning
confidence: 99%
“…For studies (Clark et al, 2019; Dou et al, 2020a; Gillette et al, 2020; Huang et al, 2021; Krug et al, 2020; Wang et al, 2021) both the transcriptomic and proteomic profiles were obtained from the CPTAC API (Lindgren et al, 2021). For colorectal (Zhang et al, 2014) and breast cancer (Mertins et al, 2016) studies, the transcriptomic data were downloaded from cBioPortal while proteomic data was obtained from the supplemental materials.…”
Section: Methodsmentioning
confidence: 99%