2021
DOI: 10.1214/21-aoas1462
|View full text |Cite
|
Sign up to set email alerts
|

Orthogonal subsampling for big data linear regression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(16 citation statements)
references
References 26 publications
0
16
0
Order By: Relevance
“…Some omitted topics include social media text data [145], virus lineage [146], public health data monitoring [147] and analysis of administrative and translation data [148]. Second, analysis for big data may alternatively begin with a selected subset of the big data with some optimality properties; see [149]. Modern statistical methods are then applied to infer the key messages from the subset data to the massive data.…”
Section: Discussionmentioning
confidence: 99%
“…Some omitted topics include social media text data [145], virus lineage [146], public health data monitoring [147] and analysis of administrative and translation data [148]. Second, analysis for big data may alternatively begin with a selected subset of the big data with some optimality properties; see [149]. Modern statistical methods are then applied to infer the key messages from the subset data to the massive data.…”
Section: Discussionmentioning
confidence: 99%
“…Such distributed covariates may not be as common as bounded or normally distributed covariates in real applications, and when covariates do have heavy-tail distributions, it is common to consider suitable transformations of the covariates before a linear model is fitted. Figure 1 in the supplementary materials (Wang et al, 2021) shows that MSE β0 does not change much across the three methods.…”
Section: The Parameter T I In the Eliminationmentioning
confidence: 97%
“…In Section 5, we examine the performance of the OSS method through extensive simulations, discuss models with interactions, and demonstrate the utility of using the OSS method to select subsamples for two real big data sets. We offer concluding remarks in Section 6 and show the proof of technical results in the supplementary materials (Wang et al, 2021).…”
mentioning
confidence: 88%
See 1 more Smart Citation
“…\end{equation}$$The algorithms we report search for a design, which minimises the mean square error (MSE) of the least square estimate (LSE) of the θ$\pmb {\theta }$ parameters, guarding against the two different sources of bias. Recently, authors of Refs 1,2,17,18 . proposed methods of data selection from large datasets in a DoE context, as a response to the more and more frequent need to analyse Big Data.…”
Section: Model‐oriented Selection Of Sub‐datasetsmentioning
confidence: 99%