2018
DOI: 10.48550/arxiv.1805.11214
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distributed Statistical Inference for Massive Data

Abstract: This paper considers distributed statistical inference for a general type of statistics that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms at different locations. In order to facilitate effective computation and to avoid expensive data communication among different platforms, we formulate distributed statistics which can be computed over smaller data blocks. The statistical properties of the distributed statistics are investi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…In recent years, two methods have commonly been used to tackle the challenges arising due to massive data. One is the divide-and-conquer (DAC) algorithm (e.g., Zhang, Duchi & Wainwright, 2013;Chen & Xie, 2014;Battey et al, 2018;Chen & Peng, 2018); the other is the resampling-based method (e.g., Kleiner et al, 2014;Sengupta, Volgushev & Shao, 2016;Wang, Yang & Stufken, 2018). By the DAC method, a massive dataset is partitioned into small subsamples, and estimators obtained from each subsample are then aggregated to form the final estimator.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In recent years, two methods have commonly been used to tackle the challenges arising due to massive data. One is the divide-and-conquer (DAC) algorithm (e.g., Zhang, Duchi & Wainwright, 2013;Chen & Xie, 2014;Battey et al, 2018;Chen & Peng, 2018); the other is the resampling-based method (e.g., Kleiner et al, 2014;Sengupta, Volgushev & Shao, 2016;Wang, Yang & Stufken, 2018). By the DAC method, a massive dataset is partitioned into small subsamples, and estimators obtained from each subsample are then aggregated to form the final estimator.…”
Section: Introductionmentioning
confidence: 99%
“…For example, Zhang, Duchi & Wainwright (2013) and Huang & Huo (2019) used DAC for M-estimators; Chen & Xie (2014) and Lee et al (2017) applied DAC to the linear and generalized linear models. Battey et al (2018) used DAC to study hypothesis testing and parameter estimation in a general likelihood-based framework in both low-dimensional and sparse high-dimensional settings; Chen & Peng (2018) applied DAC to 𝑈 -statistics and M-estimators. Chen & Peng (2018) pointed out that the resampling-based method has some limitations, such as high computational cost when the massive data are stored at different locations.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The divide and conquer technique is a popular method in distributed frameworks, where one constructs a statistic or an estimator using data in each machine, and then transmits them to the hub to get a pooled one. The divide and conquer technique has been applied successfully in many problems, including regression and classification, hypothesis testing, confidence intervals, principal eigenspaces analysis, linear discriminant analysis, and many others (Zhang et al, 2013;Hsieh et al, 2014;Zhang et al, 2015;Lin et al, 2017;SzabĂł and Van Zanten, 2019;Battey et al, 2018;Guo et al, 2019;Chen and Peng, 2018;Jordan et al, 2019;Fan et al, 2019;Tian and Gu, 2017;Li and Zhao, 2020;Dobriban and Sheng, 2021, etc. ).…”
Section: Introductionmentioning
confidence: 99%