2019
DOI: 10.1016/j.cels.2019.06.006
|View full text |Cite
|
Sign up to set email alerts
|

Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data

Abstract: Highlights d A systematic analysis on how the reference genome affects various TCGA data types d The GRCh37 (hg19) and GRCh38 (hg38) TCGA data versions are highly concordant d Generate the gene lists showing significant differences between the two versions d Provide detailed information about TCGA software, pipelines, and annotations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
114
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 130 publications
(114 citation statements)
references
References 49 publications
0
114
0
Order By: Relevance
“…The TCGA (The Cancer Genome Atlas Program) represents a joint venture of National Cancer Institute (NCI) and National Human Genome Research Institute (NGGRI) which began in 2006 as a pilot project with three cancer types (lung, ovarian and glioblastoma) which got expanded to present 33 tumor types encompassing a comprehensive dataset describing the molecular changes that occur in cancer [66]. Most samples in TCGA were originally aligned against the Genome Reference Consortium build GRCh37 (hg19) or the "legacy" dataset [67]. However, with advances in technology and drop in sequencing costs GDC (Genomic Data Commons -conceived by NCI) undertook harmonization effort to align the data to GRCh38 (hg38) build ("harmonized" dataset).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The TCGA (The Cancer Genome Atlas Program) represents a joint venture of National Cancer Institute (NCI) and National Human Genome Research Institute (NGGRI) which began in 2006 as a pilot project with three cancer types (lung, ovarian and glioblastoma) which got expanded to present 33 tumor types encompassing a comprehensive dataset describing the molecular changes that occur in cancer [66]. Most samples in TCGA were originally aligned against the Genome Reference Consortium build GRCh37 (hg19) or the "legacy" dataset [67]. However, with advances in technology and drop in sequencing costs GDC (Genomic Data Commons -conceived by NCI) undertook harmonization effort to align the data to GRCh38 (hg38) build ("harmonized" dataset).…”
Section: Discussionmentioning
confidence: 99%
“…The work ow for generating RNA-Seq data in both legacy and harmonized dataset differs substantially [67] leading to introduction of bias between the hg19 and hg38 abundance estimates. However, Gao et al, [67] demonstrated that there exists excellent concordance between the two work ows in relation to BRCA PAM50 subtypes. Further, they reported that relative change between conditions is preserved across all subtypes of BRCA PAM50.…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, there is still a considerable number of recently published studies, which make use of older sequencing data from a wide variety of sources [24][25][26] and we believe this will continue to be the case. Common incentives for reanalyzing genomic cohorts include re-mapping reads to a new reference genome version [27], periodic reanalysis of disease cohorts to diagnose more patients [28] or large meta-GWAS [29], aiming to achieve statistically signi cant results by increasing sample sizes.…”
Section: Discussionmentioning
confidence: 99%
“…The essential nature of DMG and DEG-DMG hub loci was investigated at the Genomic Data Commons Data Portal (https://portal.gdc.cancer. gov/), which contains numerous cancer datasets 42 . Screening the TCGA database revealed that all hubs identified in our analysis could undergo mutations classified as high-impact, affecting patient survival (Fig.…”
Section: Methylation Signal On Deg-dmgs Across Individuals Is Networkmentioning
confidence: 99%