2017 13th International Computer Engineering Conference (ICENCO) 2017
DOI: 10.1109/icenco.2017.8289790
|View full text |Cite
|
Sign up to set email alerts
|

Improving instance selection methods for big data classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(11 citation statements)
references
References 20 publications
0
11
0
Order By: Relevance
“…A systematic method of sampling that mediates the tensions between resource constraints, data characteristics, and the learning algorithms accuracy is needed [20]. Intricate methods to subset the big data, such as instance selection [21] and inverse sampling [22], are computationally expensive [23,24] because of the inefficient multiple preprocessing steps. Furthermore, newly added data points change data statistical measures and require re-sampling.…”
Section: Techniques For Data Reductionmentioning
confidence: 99%
“…A systematic method of sampling that mediates the tensions between resource constraints, data characteristics, and the learning algorithms accuracy is needed [20]. Intricate methods to subset the big data, such as instance selection [21] and inverse sampling [22], are computationally expensive [23,24] because of the inefficient multiple preprocessing steps. Furthermore, newly added data points change data statistical measures and require re-sampling.…”
Section: Techniques For Data Reductionmentioning
confidence: 99%
“…The continuous growth of data size makes the traditional IS methods unable to process training dataset in a single machine, due to memory limitations [ 9]. Therefore, new approaches are proposed that partition the training dataset into subsets and apply IS methods to each subset separately [10][11][12]. The approach in [ 10] uses random partitioning to partition a given training dataset into a group of manageable subsets.…”
Section: Introductionmentioning
confidence: 99%
“…However, the performance of the applied IS method to the partitioned subsets is degraded, especially for class-imbalanced datasets. In order to overcome this limitation, the approaches in [ 11,12] use stratification partitioning to ensure the equal distribution of data classes into subsets, while the instances of the same class are assigned randomly to subsets. The common feature of these approaches [10][11][12] is the random partitioning of the instances, which leads to a random representation of the instances in the partitioned subsets.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations