2008
DOI: 10.1186/1755-8794-1-42
|View full text |Cite
|
Sign up to set email alerts
|

The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

Abstract: Background: The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
170
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 134 publications
(170 citation statements)
references
References 53 publications
0
170
0
Order By: Relevance
“…Then, only probes measured on both GPL96 and GPL570 were retained (n=22,277). At this stage, we performed a second scaling normalization to set the average expression on each chip to 1000 to avoid batch effects [14].…”
Section: Methodsmentioning
confidence: 99%
“…Then, only probes measured on both GPL96 and GPL570 were retained (n=22,277). At this stage, we performed a second scaling normalization to set the average expression on each chip to 1000 to avoid batch effects [14].…”
Section: Methodsmentioning
confidence: 99%
“…At this stage, we performed a second scaling normalization to set the average expression on each chip to 1000. Although this technique cannot remove all, but it can significantly reduce batch effects (Sims et al 2008). We integrated the gene expression and clinical data using PostgreSQL, an open-source object-relational database system (www.postgresql.…”
Section: Setup Of Server For Online Survival Calculationmentioning
confidence: 99%
“…org/packages/release/bioc/html/annotate.html), and the probe was discarded if it did not match any genes. The two expression datasets were merged and synthetically analyzed using Batch Mean-centering, a merged data method (19), following adaptation according to Support Vector Machines, through the inSilicoMerging package (20).…”
Section: Data Collection and Preprocessingmentioning
confidence: 99%