2019
DOI: 10.1101/788919
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Uniform Genomic Data Analysis in the NCI Genomic Data Commons

Abstract: The goal of the National Cancer Institute (NCI) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By Ju… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
21
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 18 publications
(23 citation statements)
references
References 39 publications
2
21
0
Order By: Relevance
“…The regulatory networks were reverse engineered by ARACNe from RNASeq profiles of human cancer tissue from The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET), and a few other high quality publicly available mRNA datasets (Table S4). The RNASeq level 3 data were downloaded from NCI Genomics Data Commons [48]; raw counts were normalized and the variance was stabilized by fitting the dispersion to a negative-binomial distribution as implemented in the DESeq2 R-package [49]. ARACNe was run with 100 bootstrap iterations using an input set of candidate regulators including: 1,813 transcription factors (genes annotated in Gene Ontology Molecular Function database (GO)55 as GO:0003700-'DNA-binding transcription factor activity', or as GO:0003677-'DNA binding' and GO:0030528-'Transcription regulator activity', or as GO:0003677 and GO:0045449-'Regulation of transcription, DNA-templated'); 969 transcriptional co-factors (a manually curated list, not overlapping with the transcritpion factor list, built upon genes annotated as GO:0003712-'transcription coregulator activity' or GO:0030528 or GO:0045449); and 3,370 signaling pathway related genes (annotated in GO Biological Process database as GO:0007165-'signal transduction' and in GO Cellular Component database as GO:0005622-'intracellular' or GO:0005886-'plasma membrane').…”
Section: Generation Of Gene Regulatory Network: Aracne (Algorithm For the Reconstruction Ofmentioning
confidence: 99%
“…The regulatory networks were reverse engineered by ARACNe from RNASeq profiles of human cancer tissue from The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET), and a few other high quality publicly available mRNA datasets (Table S4). The RNASeq level 3 data were downloaded from NCI Genomics Data Commons [48]; raw counts were normalized and the variance was stabilized by fitting the dispersion to a negative-binomial distribution as implemented in the DESeq2 R-package [49]. ARACNe was run with 100 bootstrap iterations using an input set of candidate regulators including: 1,813 transcription factors (genes annotated in Gene Ontology Molecular Function database (GO)55 as GO:0003700-'DNA-binding transcription factor activity', or as GO:0003677-'DNA binding' and GO:0030528-'Transcription regulator activity', or as GO:0003677 and GO:0045449-'Regulation of transcription, DNA-templated'); 969 transcriptional co-factors (a manually curated list, not overlapping with the transcritpion factor list, built upon genes annotated as GO:0003712-'transcription coregulator activity' or GO:0030528 or GO:0045449); and 3,370 signaling pathway related genes (annotated in GO Biological Process database as GO:0007165-'signal transduction' and in GO Cellular Component database as GO:0005622-'intracellular' or GO:0005886-'plasma membrane').…”
Section: Generation Of Gene Regulatory Network: Aracne (Algorithm For the Reconstruction Ofmentioning
confidence: 99%
“…Fragments Per Kilobase of transcript per Million mapped reads (FPKM) quantification of gene expression and whole-exome sequencing (WES) Binary Alignment Map (BAM) files were downloaded from GDC. 33 Demographic and clinical information were retrieved from the TCGA Pan-Cancer Clinical Data Resource. 34 BMI and smoking history were retrieved from legacy clinical files.…”
Section: Methodsmentioning
confidence: 99%
“…The standardized, upper-quartile normalized, batch-corrected, and platform-corrected RNAseq expression of 20,531 genes in RSEM (RNA-Seq by Expectation Maximization)-quantified read count estimates were downloaded from PanCancer Atlas consortium studies (https://gdc.cancer.gov/about-data/publications/pancanatlas) and log2-transformed for further analysis. FPKM (Fragments Per Kilobase of transcript per Million mapped reads) quantification of gene expression as well as whole-exome sequencing (WES) BAM files were downloaded from GDC [28]. Demographic and clinical information were retrieved from the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR) [29].…”
Section: Methodsmentioning
confidence: 99%