2018
DOI: 10.1214/17-aos1587
|View full text |Cite
|
Sign up to set email alerts
|

Distributed testing and estimation under sparse high dimensional models

Abstract: This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

7
124
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 188 publications
(131 citation statements)
references
References 26 publications
7
124
0
Order By: Relevance
“…In addition, when G T ( ) fails to detect violation of the null hypothesis on the whole data set, G T ( ) k may still identify the violation with high power. This was observed in our application to the SEER data, and also echoes the findings in Battey et al (2018). This also suggests that, even when the data set is not huge, it might be desirable to partition the data and examine the partitions for possibly masked violations of the null hypothesis.…”
supporting
confidence: 86%
See 1 more Smart Citation
“…In addition, when G T ( ) fails to detect violation of the null hypothesis on the whole data set, G T ( ) k may still identify the violation with high power. This was observed in our application to the SEER data, and also echoes the findings in Battey et al (2018). This also suggests that, even when the data set is not huge, it might be desirable to partition the data and examine the partitions for possibly masked violations of the null hypothesis.…”
supporting
confidence: 86%
“…This was observed in our application to the SEER data, and also echoes the findings in Battey et al . (). This also suggests that, even when the data set is not huge, it might be desirable to partition the data and examine the partitions for possibly masked violations of the null hypothesis.…”
Section: Discussionmentioning
confidence: 97%
“…In this way, we are able to break a large-scale computation problem into many small pieces, then solve them with divide-and-conquer procedures and communicate only certain summary statistics. In recent years, distributed statistical inference has received considerable attention, covering a wide range of topics including M-estimation (Chen and Xie, 2014;Rosenblatt and Nadler, 2016;Lee et al, 2017;Battey et al, 2018;Shi, Lu, and Song, 2018;Jordan et al, 2018;Banerjee, Durot, and Sen, 2019;Fan, Guo, and Wang, 2019), hypothesis test (Lalitha, Sarwate, and Javidi, 2014;Battey et al, 2018), confidence intervals (Jordan, Lee, and Yang, 2018;Chen, Liu, and Zhang, 2018;Dobriban and Sheng, 2018;Wang et al, 2019), principal component analysis (Garber, Shamir, and Srebro, 2017;, nonparametric regression (Zhang, Duchi, and Wainwright, 2015;Chang, Lin, and Zhou, 2017;Shang and Cheng, 2017;Han et al, 2018;Szabó and Van Zanten, 2019), Bayesian methods (Xu et al, 2014;Jordan et al, 2018), quantile regression (Volgushev, Chao, and Cheng, 2019;Chen, Liu, and Zhang, 2019), bootstrap inference (Kleiner et al, 2014;Han and Liu, 2016), and so on.…”
Section: Introductionmentioning
confidence: 99%
“…After the usage is over, the EMA releases the memory and puts the partial information back to the hard disk. For an extremely big data set which cannot be loaded to memory of a single processor, a common solution in computer science is to partition the data set into a number of subsets by parallel or cluster computa- (Battey, et. al, 2018;Lin and Xi, 2011;Meeker and Hong, 2014).…”
Section: Introductionmentioning
confidence: 99%