2001
DOI: 10.1007/3-540-44794-6_25
|View full text |Cite
|
Sign up to set email alerts
|

Data Reduction Using Multiple Models Integration

Abstract: Abstract. Large amount of available information does not necessarily imply that induction algorithms must use all this information. Samples often provide the same accuracy with less computational cost. We propose several effective techniques based on the idea of progressive sampling when progressively larger samples are used for training as long as model accuracy improves. Our sampling procedures combine all the models constructed on previously considered data samples. In addition to random sampling, controlla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…For example, in [7], the unbalanced data are classiied by data reduction combining Tomek link (T-link) and random under sampling (RUS), T-link is used at the preprocessing phase to remove noise. [22] presents a data reduction method based on the combination of multiple sampling models. This method uses weighted voting to combine the models and utilizes the efective imprecise model pruning technology to improve the accuracy of data reduction.…”
Section: Principal Component Analysis Principal Component Analysis Pcamentioning
confidence: 99%
“…For example, in [7], the unbalanced data are classiied by data reduction combining Tomek link (T-link) and random under sampling (RUS), T-link is used at the preprocessing phase to remove noise. [22] presents a data reduction method based on the combination of multiple sampling models. This method uses weighted voting to combine the models and utilizes the efective imprecise model pruning technology to improve the accuracy of data reduction.…”
Section: Principal Component Analysis Principal Component Analysis Pcamentioning
confidence: 99%
“…Some other data reduction methods proposed include the Lazarevic and Obradovic [2] in the form of data reduction with Multiple Models Integration. Provost et al (1999) proposed a reduction by efficient progressive sampling [8].…”
Section: Introductionmentioning
confidence: 99%
“…Though no directly comparable study exists, several studies in other fields that use progressive sampling for to increase training efficiency of discrete datasets exist. Six studies that look at a combined 22 different datasets, including land cover type (Lazarevic and Obradovic, 2001;Peng et al 2004), traffic data (Umarani and Punithavalli, 2011), waveform (Lazarevic and Obradovic, 2001;Ng and Dash, 2006;Peng et al 2004), simulated data (ElRafey and Wojtusiak, 2017; Umarani and Punithavalli, 2011), wine quality data (ElRafey and Wojtusiak, 2017), with varying number of categories or attributes. The effective sample size was determined by each author and is not related to the indicators selected in this study.…”
Section: Discussionmentioning
confidence: 99%
“…This was a result of the small representation of humid continental class. Rather than increase the sample size, it may be more appropriate to use methods such as a stratified random sample, or a progressive boosting to optimize sample size and account for imbalanced data (Lazarevic and Obradovic, 2001;Soleymani et al 2018). This would align with approaches used in land cover classification where a minimum sample size per class is often defined (EFTAS and FAO, 2015).…”
Section: Discussionmentioning
confidence: 99%