A review of instance selection methods

Olvera-López, J. Arturo; Carrasco-Ochoa, Jesús Ariel; Martínez-Trinidad, José Fco.; Kittler, Josef

doi:10.1007/s10462-010-9165-y

Cited by 336 publications

(179 citation statements)

References 44 publications

(63 reference statements)

Supporting

Mentioning

173

Contrasting

Unclassified

Order By: Relevance

“…There exist generic training set selection techniques [also referred to as instance selection algorithms in the literature (Olvera-López et al 2010)], and those designed for other classifiers [k-nearest neighbors (Angiulli 2005), neural networks (Reeves and Taylor 1998), and many other (Hernandez-Leal et al 2013;Wenyuan et al 2013)], but-due to the specific characteristics of the SVM training process and operation-the majority of SVM training set selection algorithms are crafted for this classifier. In this review, we summarize the state-of-the-art algorithms for selecting SVM training data from large datasets.…”

Section: Motivation and Goalsmentioning

confidence: 99%

Selecting training sets for support vector machines: a review

2018

View full text Add to dashboard Cite

Support vector machines (SVMs) are a supervised classifier successfully applied in a plethora of real-life applications. However, they suffer from the important shortcomings of their high time and memory training complexities, which depend on the training set size. This issue is especially challenging nowadays, since the amount of data generated every second becomes tremendously large in many domains. This review provides an extensive survey on existing methods for selecting SVM training data from large datasets. We divide the state-of-the-art techniques into several categories. They help understand the underlying ideas behind these algorithms, which may be useful in designing new methods to deal with this important problem. The review is complemented with the discussion on the future research pathways which can make SVMs easier to exploit in practice.

show abstract

Section: Motivation and Goalsmentioning

confidence: 99%

Selecting training sets for support vector machines: a review

2018

View full text Add to dashboard Cite

show abstract

“… Redundancy: as the name implies is the redundant information such as duplicate instances and derived attributes of others that contain the same information [26], [27]. In Table 3 are shown the approaches to solve the problems related to amount of data and redundancy.…”

Section: Journal Of Computersmentioning

confidence: 99%

“…[26], [27] It is worth mentioning that construction of reports (step of SEMMA methodology) is taking into account in all phases of FDQ-KDT.…”

Section: Journal Of Computersmentioning

confidence: 99%

See 1 more Smart Citation

A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal

Corrales¹,

Ledezma²,

Corrales³

2015

JCP

View full text Add to dashboard Cite

Large Volume of Data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through data mining and data science methodologies. Nevertheless these not tackle the issues in data quality clearly, leaving out relevant activities. We proposed a conceptual framework for data quality in knowledge discovery tasks based on CRISP-DM, SEMMA and Data Science, considering the issues of ESE Taxonomy.

show abstract

“…DRTs [18,5,20,9,6,2,13] reduce the computational cost of classification by building a small representative set of the initial training data, called the Condensing Set (CS). The idea behind DRTs is to apply the k-NN classifier over S. Ougiaroglou is supported by a scholarship from the Greek Scholarships Foundations (I.K.Y.…”

Section: Introductionmentioning

confidence: 99%

A Fast Hybrid k-NN Classifier Based on Homogeneous Clusters

Ougiaroglou¹,

Evangelidis²

2012

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Abstract. This paper proposes a hybrid method for fast and accurate Nearest Neighbor Classification. The method consists of a nonparametric cluster-based algorithm that produces a two-level speed-up data structure and a hybrid algorithm that accesses this structure to perform the classification. The proposed method was evaluated using eight real-life datasets and compared to four known speed-up methods. Experimental results show that the proposed method is fast and accurate, and, in addition, has low pre-processing computational cost.

show abstract

A review of instance selection methods

Cited by 336 publications

References 44 publications

Selecting training sets for support vector machines: a review

Selecting training sets for support vector machines: a review

A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal

A Fast Hybrid k-NN Classifier Based on Homogeneous Clusters

Contact Info

Product

Resources

About