2010
DOI: 10.1038/nrg2825
|View full text |Cite
|
Sign up to set email alerts
|

Tackling the widespread and critical impact of batch effects in high-throughput data

Abstract: High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

12
1,600
1
3

Year Published

2012
2012
2022
2022

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 1,759 publications
(1,616 citation statements)
references
References 30 publications
12
1,600
1
3
Order By: Relevance
“…Sequences generated from samples sent to the sequencing center on this date were therefore removed from further analysis. Leek et al (2010) recently showed the importance of screening high-throughput data sets for batch effects and screening for batch effects indeed proved useful in removing the technical artifacts from our data set. The characteristics of the 71 samples, selected after sample filtering, are shown in Table 1.…”
Section: Data Filteringmentioning
confidence: 99%
“…Sequences generated from samples sent to the sequencing center on this date were therefore removed from further analysis. Leek et al (2010) recently showed the importance of screening high-throughput data sets for batch effects and screening for batch effects indeed proved useful in removing the technical artifacts from our data set. The characteristics of the 71 samples, selected after sample filtering, are shown in Table 1.…”
Section: Data Filteringmentioning
confidence: 99%
“…Furthermore, in the context of cancer biobanking, if matched tumour and normal specimens are not subjected to the same collection protocols, differences in processing may create additional apparent class differences beyond those caused by disease status (Lim et al 2011). Differences in biospecimen quality may therefore contribute to or further complicate the interpretation of batch effects within high-throughput data (Leek et al 2010).…”
Section: The Biospecimen Lifecycle-what Can and Cannot Be Controlledmentioning
confidence: 99%
“…In turn, researchers are expert in their fields of investigation, and in the analytes that they are assessing, and have access to further expertise through collaborative networks (Lim et al 2011). The routine supply of detailed biospecimen data and subsequent integration of these variables into research analyses could lead to the identification of unexpected associations between pre-analytical variables and research measures, just as collaborations between laboratory scientists and data analysts can facilitate the identification of batch effects (Leek et al 2010). This would extend biospecimen science from a discrete field to a routine activity carried out by all biospecimen researchers.…”
Section: Summary and Concluding Remarksmentioning
confidence: 99%
“…The profiles of the same technical replicates generated by two laboratories may be different due to batch effect. If necessary, these datasets need to be jointly normalized before pursuing any further analysis (Leek et al 2010). Besides microarray data analysis, the use of appropriate background genomic control is important in ChIP-seq analysis.…”
Section: Public Data Repositories and Bioinformatic Toolsmentioning
confidence: 99%