Time‐series clustering via quasi <i>U</i>‐statistics

Valk, Márcio; Pinheiro, Aluísio

doi:10.1111/j.1467-9892.2012.00793.x

Cited by 7 publications

(16 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, we note that, for larger group sizes, the test achieves adequate power, and we thus recommend its use for homogeneity testing with around 20 samples or more. For smaller group sizes, the overall type I error of the uncorrected multiple U test approach of Valk & Pinheiro (2012) is not largely affected by multiple testing, and should be preferred due to its larger power.…”

Section: Discussionmentioning

confidence: 99%

“…Pinheiro et al (2009) show that B n is in the class of degenerate U-statistics (called quasi U-statistics) where the asymptotic distribution is normal with convergence rates L and/or n, even if the assumption of stochastic independence between samples does not hold. Adapting the results in Pinheiro et al (2009) to the context of time series, Valk & Pinheiro (2012) develop methods for classification and clustering analysis for stationary time series.…”

Section: U-statistics Based Testsmentioning

confidence: 99%

“…The procedure for assessing group homogeneity proposed by Valk & Pinheiro (2012) involves applying the U test for all possible group configurations. For large group sizes, when applying this strategy, we must take into account multiple testing issues.…”

Section: Assessing Group Homogeneitymentioning

confidence: 99%

“…We are interested in whether a new sample X * would be classified in group G 1 or G 2 . Valk & Pinheiro (2012) suggest a comparative approach based on statistics B 1 and B 2 , where B 1 is the statistics B n of (2.7) when the new sample is classified in group G 1 , and B 2 is defined likewise. Note that if X * is not well classified in G 2 , we might expect the statistic B 2 to be smaller than B n computed without including the new sample, since this increases the distances within group G 2 .…”

Section: Classification Testmentioning

confidence: 99%

“…We consider the homogeneity test which uses the clustering algorithm given in Appendix A.1 to find the configuration with maximum normalized U test statistic and then correct for multiple testing through the max test. We compare these results with the approach of Valk & Pinheiro (2012) of multiple U tests, and with this same approach corrected for multiple testing through the Bonferroni correction. Table 1 presents the size of the homogeneity test, measured as the fraction of simulations under the null hypothesis for which H 0 was rejected, considering the theoretical α = 0.05.…”

Section: Size and Power Of The Homogeneity Testmentioning

confidence: 99%

See 4 more Smart Citations

Clustering and classification problems in genetics through U-statistics

Cybis

Valk

Lopes

2017

Journal of Statistical Computation and Simulation

Self Cite

View full text Add to dashboard Cite

Genetic data are frequently categorical and have complex dependence structures that are not always well understood. For this reason, clustering and classification based on genetic data, while highly relevant, are challenging statistical problems. Here we consider a highly versatile U-statistics based approach built on dissimilarities between pairs of data points for nonparametric clustering. In this work we propose statistical tests to assess group homogeneity taking into account the multiple testing issues, and a clustering algorithm based on dissimilarities within and between groups that highly speeds up the homogeneity test. We also propose a test to verify classification significance of a sample in one of two groups. A Monte Carlo simulation study is presented to evaluate power of the classification test, considering different group sizes and degree of separation. Size and power of the homogeneity test are also analyzed through simulations that compare it to competing methods. Finally, the methodology is applied to three different genetic datasets: global human genetic diversity, breast tumor gene expression and Dengue virus serotypes. These applications showcase this statistical framework's ability to answer diverse biological questions while adapting to the specificities of the different datatypes.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: U-statistics Based Testsmentioning

confidence: 99%

Section: Assessing Group Homogeneitymentioning

confidence: 99%

Section: Classification Testmentioning

confidence: 99%

Section: Size and Power Of The Homogeneity Testmentioning

confidence: 99%

See 3 more Smart Citations