2013
DOI: 10.4137/cin.s12862
|View full text |Cite
|
Sign up to set email alerts
|

Monitoring of Technical Variation in Quantitative High-Throughput Datasets

Abstract: High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study’s conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
47
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
9

Relationship

2
7

Authors

Journals

citations
Cited by 58 publications
(48 citation statements)
references
References 27 publications
1
47
0
Order By: Relevance
“…38) in the 1,398-sample cohort, including different clinicopathologic factors (gender, stage, clinical smoking status, EGFR and KRAS mutations), unsupervised gene expression clusters (see below), and molecular adenocarcinoma subtypes (26). Supporting the moderate supervised classification results, and the identification of a smaller set of smoking-related CNAs, we found that smoking status was not a dominant contributor to the total copy-number variation in the cohort (Supplementary Fig.…”
Section: Supervised Classification Of Smoking-related Cnassupporting
confidence: 49%
See 1 more Smart Citation
“…38) in the 1,398-sample cohort, including different clinicopathologic factors (gender, stage, clinical smoking status, EGFR and KRAS mutations), unsupervised gene expression clusters (see below), and molecular adenocarcinoma subtypes (26). Supporting the moderate supervised classification results, and the identification of a smaller set of smoking-related CNAs, we found that smoking status was not a dominant contributor to the total copy-number variation in the cohort (Supplementary Fig.…”
Section: Supervised Classification Of Smoking-related Cnassupporting
confidence: 49%
“…PCA analysis (38) performed in the TCGA RNAseq and Chitale and colleagues (32) gene expression microarray cohorts confirmed that clinical smoking status together with other clinicopathologic factors such as stage, gender, EGFR, and KRAS mutation status were not strong contributors to the total variation in gene expression compared with for instance reported adenocarcinoma subtypes (26; Supplementary Fig. S5).…”
Section: Smokers and Never-smokers Aggregating Together In Consensus mentioning
confidence: 99%
“…When preparing libraries for NGS sequencing, it is also critical to give consideration to the mitigation of batch effects (4345). It is also important to acknowledge the impact of systematic bias resulting from the molecular manipulations required to generate NGS data; for example, the bias introduced by sequence-dependent differences in adaptor ligation efficiencies in miRNA-seq library preparations.…”
Section: Considerationsinngslibrary Preparation: Complexity Bias Anmentioning
confidence: 99%
“…13; Supplementary Materials and Methods). Principal component analysis (14), including clinicopathologic and technical factors, and comparison of bisulfite conversion plate and beadchip id against unsupervised bootstrap clusters were performed to assess that no technical artifacts influenced methylation data, or bootstrap groups, for the 124-sample discovery cohort ( Supplementary Fig. S1B-S1D).…”
Section: Global Methylation Analysismentioning
confidence: 99%