Fast Computing for Distance Covariance

Huo, Xiaoming; Székely, Gábor

doi:10.1080/00401706.2015.1054435

Cited by 112 publications

(97 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dCor also has quadratic complexity though it is hard to see from our figure. (Though the R implementation of dCor, which we adopt here, has quadratic complexity in sample size, we note that there is a recent work (Huo and Székely, 2016) proposes a new algorithm for computing dCor that has computational complexity O(n log n) in sample size.) To see how the computation under FES scale with the marginal maximum resolution k 1 and k 2 , we repeat the analysis for k 1 = k 2 = 4, 5, .…”

Section: Computational Scalabilitymentioning

confidence: 99%

Fisher Exact Scanning for Dependency

Mao

2018

Journal of the American Statistical Association

View full text Add to dashboard Cite

We introduce a method-called Fisher exact scanning (FES)-for testing and identifying variable dependency that generalizes Fisher's exact test on 2 × 2 contingency tables to R×C contingency tables and continuous sample spaces. FES proceeds through scanning over the sample space using windows in the form of 2 × 2 tables of various sizes, and on each window completing a Fisher's exact test. Based on a factorization of Fisher's multivariate hypergeometric (MHG) likelihood into the product of the univariate hypergeometric likelihoods, we show that there exists a coarse-to-fine, sequential generative representation for the MHG model in the form of a Bayesian network, which in turn implies the mutual independence (up to deviation due to discreteness) among the Fisher's exact tests completed under FES. This allows an exact characterization of the joint null distribution of the p-values and gives rise to an effective inference recipe through simple multiple testing procedures such asŠidák and Bonferroni corrections, eliminating the need for resampling. In addition, FES can characterize dependency through reporting significant windows after multiple testing control. The computational complexity of FES is approximately linear in the sample size, which along with the avoidance of resampling makes it ideal for analyzing massive data sets. We use extensive numerical studies to illustrate the work of FES and compare it to several state-of-the-art methods for testing dependency in both statistical and computational performance. Finally, we apply FES to analyzing a microbiome data set and further investigate its relationship with other popular dependency metrics in that context.

show abstract

Section: Computational Scalabilitymentioning

confidence: 99%

Fisher Exact Scanning for Dependency

Mao

2018

Journal of the American Statistical Association

View full text Add to dashboard Cite

show abstract

“…However, sparse data is the relevant context for the PDC periodogram, since in large datasets periodicities are usually adequately detectable by the GLS or other conventional techniques. Nevertheless, one approach to try and improve the complexity of the computation is suggested in Huo & Székely (2016). Huo & Székely use an unbiased version of distance correlation first suggested by Székely & Rizzo (2014), and apply to it an AVL-tree computational approach (Adelson-Velskii & Landis 1962), to obtain an O(N log N) algorithm to calculate the distance correlation.…”

Section: Discussionmentioning

confidence: 99%

Detection of periodicity based on independence tests – III. Phase distance correlation periodogram

Zucker

2017

Monthly Notices of the Royal Astronomical Society: Letters

View full text Add to dashboard Cite

I present the Phase Distance Correlation (PDC) periodogram -a new periodicity metric, based on the Distance Correlation concept of Gábor Székely. For each trial period PDC calculates the distance correlation between the data samples and their phases. PDC requires adaptation of the Székely's distance correlation to circular variables (phases). The resulting periodicity metric is best suited to sparse datasets, and it performs better than other methods for sawtooth-like periodicities. These include Cepheid and RR-Lyrae light curves, as well as radial velocity curves of eccentric spectroscopic binaries. The performance of the PDC periodogram in other contexts is almost as good as that of the Generalized Lomb-Scargle periodogram. The concept of phase distance correlation can be adapted also to astrometric data, and it has the potential to be suitable also for large evenly-spaced datasets, after some algorithmic perfection.

show abstract

“…Here, X ′ (respectively Y ′ ) denotes an independent copy of X (respectively Y ). An unbiased estimator of squared distance covariance proposed by Székely and Rizzo () is given by

{\hat{V}}_{U} (X, Y) = \frac{1}{n (n - 3)} \sum_{i \neq j} Ã_{i j} {\tilde{B}}_{i j}

for n > 3, where the so‐called

scriptU

‐centred matrices,

Ã_{i j}

, have the additional property that

E false[Ã_{i j} false] = 0

for all i , j and are defined by

Ã_{i j} = \{\begin{matrix} a_{i j} - \frac{1}{n - 2} true \sum_{l = 1}^{n} a_{i l} - \frac{1}{n - 2} true \sum_{k = 1}^{n} a_{k j} + \frac{1}{false(n - 1 false) false(n - 2 false)} true \sum_{k, l = 1}^{n} a_{k l}, & i \neq j; \\ 0, & i = j . \end{matrix}

For p = q = 1, Huo and Székely () have shown that

{\overset{V}{true}}_{U} false(X, Y false)

is a U‐statistic, which is degenerate in the case where X and Y are independent. For further details concerning the properties of

{\overset{V}{true}}_{U} false(boldX, boldY false)

, we refer to the paper by Huang and Huo ().…”

Section: Estimation Testing and Further Propertiesmentioning

confidence: 99%

“…For further details concerning the properties of

{\overset{V}{true}}_{U} false(boldX, boldY false)

, we refer to the paper by Huang and Huo (). By using the U‐statistic representation of

{\overset{V}{true}}_{U} false(X, Y false)

, Huo and Székely () show that it can be computed by an

O false(n normallog n false)

algorithm. This algorithm considerably speeds up the calculation of the distance correlation coefficient for large sample sizes.…”

Section: Estimation Testing and Further Propertiesmentioning

confidence: 99%

An Updated Literature Review of Distance Correlation and Its Applications to Time Series

Edelmann

Fokianos

Pitsillou

2018

Int Statistical Rev

View full text Add to dashboard Cite

Summary The concept of distance covariance/correlation was introduced recently to characterise dependence among vectors of random variables. We review some statistical aspects of distance covariance/correlation function, and we demonstrate its applicability to time series analysis. We will see that the auto‐distance covariance/correlation function is able to identify non‐linear relationships and can be employed for testing the i.i.d. hypothesis. Comparisons with other measures of dependence are included.

show abstract

Fast Computing for Distance Covariance

Cited by 112 publications

References 13 publications

Fisher Exact Scanning for Dependency

Fisher Exact Scanning for Dependency

Detection of periodicity based on independence tests – III. Phase distance correlation periodogram

An Updated Literature Review of Distance Correlation and Its Applications to Time Series

Contact Info

Product

Resources

About