Data Reduction Using Multiple Models Integration

Lazarević, Aleksandar; Obradović, Zoran

doi:10.1007/3-540-44794-6_25

Cited by 10 publications

(4 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, in [7], the unbalanced data are classiied by data reduction combining Tomek link (T-link) and random under sampling (RUS), T-link is used at the preprocessing phase to remove noise. [22] presents a data reduction method based on the combination of multiple sampling models. This method uses weighted voting to combine the models and utilizes the efective imprecise model pruning technology to improve the accuracy of data reduction.…”

Section: Principal Component Analysis Principal Component Analysis Pcamentioning

confidence: 99%

ARIS: A Noise Insensitive Data Pre-Processing Scheme for Data Reduction Using Influence Space

Cai

Yang

et al. 2022

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

The extensive growth of data quantity has posed many challenges to data analysis and retrieval. Noise and redundancy are typical representatives of the above-mentioned challenges, which may reduce the reliability of analysis and retrieval results and increase storage and computing overhead. To solve the above problems, A two-stage data pre-processing framework for noise identification and data reduction, called ARIS, is proposed in this paper. The first stage identifies and removes noises by the following steps: First, the influence space (IS) is introduced to elaborate data distribution. Second, a ranking factor (RF) is defined to describe the possibility that the points are regarded as noises, then, the definition of noise is given based on RF. Third, a clean dataset (CD) is obtained by removing noise from the original dataset. The second stage learns representative data and realizes data reduction. In this process, CD is divided into multiple small regions by IS. Then the reduced dataset is formed by collecting the representations of each region. The performance of ARIS are verified by experiments on artificial and real datasets. Experimental results show that ARIS effectively weakens the impact of noise and reduces the amount of data and significantly improves the accuracy of data analysis within a reasonable time cost range.

show abstract

Section: Principal Component Analysis Principal Component Analysis Pcamentioning

confidence: 99%

ARIS: A Noise Insensitive Data Pre-Processing Scheme for Data Reduction Using Influence Space

Cai

Yang

et al. 2022

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

show abstract

“…Some other data reduction methods proposed include the Lazarevic and Obradovic [2] in the form of data reduction with Multiple Models Integration. Provost et al (1999) proposed a reduction by efficient progressive sampling [8].…”

Section: Introductionmentioning

confidence: 99%

K- Support Vector Nearest Neighbor: Classification Method, Data Reduction, and Performance Comparison

Prasetyo

2016

J. Electr. Eng. Comput. Sci.

View full text Add to dashboard Cite

The use of data mining in the past 2 decades in harnessing the data sets become important. This is due to the information given outcome becomes very important, but the big problem are the obstacles data mining task is a very large amount of data. A very large number indeed specificity of data mining in extracting information, but the amount of too big data also cause decrease the performance. On the issue of classification, data that are not positioned on the decision boundary becomes less useful and make classification method is not efficient. K-Nearest Neighbor Support Vector present to answer the problem that data is normally owned by very large data. K-SVNN able to reduce the amount of very large data with good accuracy without degrading performance. Results of performance comparisons with a number of classification method also proves that K-SVNN can provide good accuracy. Among the five comparison methods, K-SVNN got in the big 3 methods. K-SVNN difference accuracy to other methods less of 0.66% on the data set Iris and 20:29% on the data set Wine.

show abstract

“…Though no directly comparable study exists, several studies in other fields that use progressive sampling for to increase training efficiency of discrete datasets exist. Six studies that look at a combined 22 different datasets, including land cover type (Lazarevic and Obradovic, 2001;Peng et al 2004), traffic data (Umarani and Punithavalli, 2011), waveform (Lazarevic and Obradovic, 2001;Ng and Dash, 2006;Peng et al 2004), simulated data (ElRafey and Wojtusiak, 2017; Umarani and Punithavalli, 2011), wine quality data (ElRafey and Wojtusiak, 2017), with varying number of categories or attributes. The effective sample size was determined by each author and is not related to the indicators selected in this study.…”

Section: Discussionmentioning

confidence: 99%

“…This was a result of the small representation of humid continental class. Rather than increase the sample size, it may be more appropriate to use methods such as a stratified random sample, or a progressive boosting to optimize sample size and account for imbalanced data (Lazarevic and Obradovic, 2001;Soleymani et al 2018). This would align with approaches used in land cover classification where a minimum sample size per class is often defined (EFTAS and FAO, 2015).…”

Section: Discussionmentioning

confidence: 99%

Evaluating quality of remote sensing-based agricultural water productivity data

Blatchford¹

View full text Add to dashboard Cite

vi 4.11 NDVI and ETIa-WPR for the EG-ZAN site for all three spatial resolutions (L3=30m, L2=100m and L1 = 250m) on dekad 1222 (1st dekad of Aug 2012). The point is the station location; the circle is the buffer used for data extraction to compare to the ETa-EC. . . . . . . . . 4.12Upper -number of observations for a given ETa-EC range. Lower -bias of dekadal ETIa-WPR (mm day -1 ), as compared to ETa-EC, plotted against the increasing ranges of ETa-EC (mm day -1 ) for observations at natural vegetation sites (orange bar), irrigated agriculture sites (blue bar) and all sites (grey bar

show abstract

Data Reduction Using Multiple Models Integration

Cited by 10 publications

References 7 publications

ARIS: A Noise Insensitive Data Pre-Processing Scheme for Data Reduction Using Influence Space

ARIS: A Noise Insensitive Data Pre-Processing Scheme for Data Reduction Using Influence Space

K- Support Vector Nearest Neighbor: Classification Method, Data Reduction, and Performance Comparison

Evaluating quality of remote sensing-based agricultural water productivity data

Contact Info

Product

Resources

About