Instance reduction for one-class classification

Krawczyk, Bartosz; Triguero, Isaac; García, Salvador; Woźniak, Michał; Herrera, Francisco

doi:10.1007/s10115-018-1220-z

Cited by 26 publications

(17 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although evolutionary algorithms are efficient in performance, a pitfall of them is the essence of storing all training instances subsets in memory space for large datasets, which might also impact the efficiency of the testing task. A solution to this problem introduced in [28] with an instance selection method that removes redundant and noisy instances. [29] provides a survey of existing techniques used to reduce storage requirements in instance-based learning algorithms, including different reduction and/or reconstruction algorithms of the training set.…”

Section: Related Workmentioning

confidence: 99%

The Genomics of Industrial Process through the Qualia of Markovian Behaviour

Danishvar¹,

Daneshvar²,

Mousavi³

2022

Preprint

View full text Add to dashboard Cite

A technique for registering and relating events that cause an observable and definable system state is proposed. Discrete events of system state transfer are expressed by event tracking and clustering in the form of contiguous quanta of data. This approach is capable of describing typical processes in industrial systems in a chain of codes that contain system input/output parameters. The constituent nodes of the Markovian Processes chain form a series akin to genes in the DNA, repeatable and predictable.<br>

show abstract

Section: Related Workmentioning

confidence: 99%

The Genomics of Industrial Process through the Qualia of Markovian Behaviour

Danishvar¹,

Daneshvar²,

Mousavi³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Although evolutionary algorithms are efficient in performance, a pitfall of them is the essence of storing all training instances subsets in memory space for large datasets, which might also impact the efficiency of the testing task. A solution to this problem introduced in [25] with an instance selection method that removes redundant and noisy instances. Gong et al [26] provided a survey of existing techniques used to reduce storage requirements in IBL algorithms, including different reduction and/or reconstruction algorithms of the training set.…”

Section: Related Workmentioning

confidence: 99%

The Genomics of Industrial Process Through the Qualia of Markovian Behavior

Danishvar

Mousavi

Daneshvar

2022

IEEE Trans. Syst. Man Cybern, Syst.

View full text Add to dashboard Cite

A technique for registering and relating events that cause an observable and definable system state is proposed. Discrete events of system-state transfer are expressed by event tracking and clustering in the form of contiguous quanta of data. This approach is capable of describing typical processes in industrial systems in a chain of codes that contain system input/output parameters. The constituent nodes of the Markovian Processes chain form a series akin to genes in the deoxyribonucleic acid, repeatable and predictable. The process genes are the quanta of information that aligns to represent a chain of activities (process). They describe the causal links between occurring events forming a pattern (pathway) that leads to a well-specified output (e.g., a product with a defect or otherwise). The creation of process genomics requires the knowledge of system observed or latent parameters (state) as well as the state change at specified time intervals (discretization). The process genomics theory is tested in an industrial case study for quality assessment and control of glue dispensing in micro-semiconductor manufacturing. The resulting definitions of the system state and interrelationship of control parameters contribute to the development of the process genes. The outcome of the gene alignment is the geometric interpretation of the glue droplet formation. A predicted or observed droplet within the production tolerance leads to a nondefective product. The principle of creating production genomics is to find and rectify the defect-causing genes or to disrupt the sequences that lead to producing defective products, leading to a zero-defect manufacturing process.

show abstract

“…Regarding the classification stage, the constructed one-class classifiers are able to distinguish between new instances in the target class and unknown instances outside the created decision boundary as anomalies of the minority class. In particular, an anomaly score is assigned to each testing datum by one-class classifiers, which defines the decision boundary to separate normal data from outliers (Khan and Madden, 2014;Krawczyk et al, 2019).…”

Section: Class Imbalanced Datasets 771mentioning

confidence: 99%

Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers

Wang

Tsai

Lin

2021

DTA

View full text Add to dashboard Cite

PurposeClass imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.Design/methodology/approachIn this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.FindingsThe experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.Originality/valueThe novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared.

show abstract

Instance reduction for one-class classification

Cited by 26 publications

References 51 publications

The Genomics of Industrial Process through the Qualia of Markovian Behaviour

The Genomics of Industrial Process through the Qualia of Markovian Behaviour

The Genomics of Industrial Process Through the Qualia of Markovian Behavior

Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers

Contact Info

Product

Resources

About