2019
DOI: 10.1007/s42519-019-0048-5
|View full text |Cite
|
Sign up to set email alerts
|

Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

Abstract: The information-based optimal subdata selection (IBOSS) is a computationally efficient method to select informative data points from large data sets through processing full data by columns. However, when the volume of a data set is too large to be processed in the available memory of a machine, it is infeasible to implement the IBOSS procedure. This paper develops a divide-and-conquer IBOSS approach to solving this problem, in which the full data set is divided into smaller partitions to be loaded into the mem… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 14 publications
(33 reference statements)
0
6
0
Order By: Relevance
“…For modeling with interactions, OSS is significantly faster than IBOSS because OSS chooses subsamples only based on covariates while IBOSS relies also on the interaction terms. It should be noted that IBOSS (Wang, 2019) and OSS can both be further accelerated with parallel computing, making the selection and analysis of the subsamples much more efficient. 5.4.…”
Section: Computing Timementioning
confidence: 99%
“…For modeling with interactions, OSS is significantly faster than IBOSS because OSS chooses subsamples only based on covariates while IBOSS relies also on the interaction terms. It should be noted that IBOSS (Wang, 2019) and OSS can both be further accelerated with parallel computing, making the selection and analysis of the subsamples much more efficient. 5.4.…”
Section: Computing Timementioning
confidence: 99%
“…In this context, there are many subset selection methods, e.g. for linear models (Wang (2019)) or Bayesian system identification (Green (2015)). Recently, Peter and Nelles (2019) presented a more holistic approach, which selects points by targeting an arbitrary probability density func-tion of the subset.…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, the DAC strategy has been extended to a sparse Cox regression by Wang et al (2019); Xue et al (2020) developed a DAC algorithm that updates test statistics for hypothesis testing of the proportional hazards assumption under the Cox model as blocks of data are received sequentially. Moreover, the DAC algorithm has been incorporated in many existing techniques from various fields to improve precision and efficiency, namely, the evolutionary algorithm for large-scale optimization (Yang, Tang & Yao, 2019), information-based optimal subdata selection algorithm (Wang, 2019), a coevolutionary algorithm to enhance resource allocation for better control of a spreading virus (Zhao et al, 2020), and precision oncology for subtypes of sarcoma (Pestana et al, 2020), among others.…”
Section: Introductionmentioning
confidence: 99%