2010
DOI: 10.1093/biomet/asp075
|View full text |Cite
|
Sign up to set email alerts
|

The distribution-based p-value for the outlier sum in differential gene expression analysis

Abstract: SUMMARYOutlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the lar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…While for each gene g = 1, …, 500 the log 2 expression values for the control group were drawn independently from a normal distribution, X gi ∼ N (0, σ 2italicg), i = 1, …, n x , those for the tumor group were drawn independently from a mixture of two normal distributions: for i = 1, …, n y . Note that, similar to the simulations in 13, the shift for upregulated samples is constant in standard deviation units and thus controlled for by the parameter δ. Genes g = 501, …, 1000 were assumed to be unrelated to tumor and thus for these genes the log 2 expression values for both groups were drawn independently from a normal distribution N (0, σ 2italicg).…”
Section: Simulationsmentioning
confidence: 81%
“…While for each gene g = 1, …, 500 the log 2 expression values for the control group were drawn independently from a normal distribution, X gi ∼ N (0, σ 2italicg), i = 1, …, n x , those for the tumor group were drawn independently from a mixture of two normal distributions: for i = 1, …, n y . Note that, similar to the simulations in 13, the shift for upregulated samples is constant in standard deviation units and thus controlled for by the parameter δ. Genes g = 501, …, 1000 were assumed to be unrelated to tumor and thus for these genes the log 2 expression values for both groups were drawn independently from a normal distribution N (0, σ 2italicg).…”
Section: Simulationsmentioning
confidence: 81%
“…Previous simulation studies evaluating outlier-based differential expression methods have used Gaussian mixtures [6,10,12], or t-distributions [10] to produce synthetic data with outlier-type patterns of differential expression. To understand the operating characteristics of outlier-based differential expression analysis in more detail, we used a simulation approach in which the strength of differential expression in the tails and the strength of differential expression in the center of the distribution can be independently varied.…”
Section: Methodsmentioning
confidence: 99%
“…To this end, a number of methods have been developed to detect so-called “cancer outlier genes” or genes expressed in only a subset of cancer samples. Methods for cancer outlier profile analysis include the COPA approach of Tomlins et al [5], the outlier sum (OS) test [6], the outlier robust t-test [7], the MOST method [8], the LSOSS method [9], distribution based outlier sum statistics [10] and others. Compared to the traditional t-statistic, outlier-associated methods have the potential to detect a greater number of differentially-expressed genes in heterogeneous data sets, at a lower false discovery rate.…”
Section: Introductionmentioning
confidence: 99%
“…Hence, a simple central limit theorem would suggest that the outlier-sum statistic would approximate a normal distribution in large samples, providing the quartiles used to define the threshold for outliers were known and the MADs used to standardize the measurements were nonzero. Chen et al [6] rigorously consider the use of estimated quartiles in order to derive a limiting distribution for the outlier-sum statistic for a known distribution of gene expression in a healthy population. However, when the healthy population's distribution is unknown or varies across genes their results do not apply.…”
Section: Calibrationmentioning
confidence: 99%