DOI: 10.29007/h71z
|View full text |Cite
|
Sign up to set email alerts
|

On relationships between imbalance and overlapping of datasets

Abstract: The paper deals with problems that imbalanced and overlapping datasets often en- counter. Performance indicators as accuracy, precision and recall of imbalanced data sets, both with and without overlapping, are discussed and compared with the same performance indicators of balanced datasets with overlapping. Three popular classification algorithms, namely, Decision Tree, KNN (k-Nearest Neighbors) and SVM (Support Vector Machines) classifiers are analyzed and compared.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0
1

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 17 publications
0
4
0
1
Order By: Relevance
“…A similar photothermal effect has been observed before when using plasmonic gold nanoparticles, as PTT agent, mixed in butter. [27] Stability of the PTT agents is of extreme importance, as any photodamage will result in poor photothermal conversion. Our results show that the photoactive UCNPs have unmatched photostability ( Figure S4b, Supporting Information), compared to common dyes, and quantum dots, and may be used for PTT in biomedical applications.…”
Section: Resultsmentioning
confidence: 99%
“…A similar photothermal effect has been observed before when using plasmonic gold nanoparticles, as PTT agent, mixed in butter. [27] Stability of the PTT agents is of extreme importance, as any photodamage will result in poor photothermal conversion. Our results show that the photoactive UCNPs have unmatched photostability ( Figure S4b, Supporting Information), compared to common dyes, and quantum dots, and may be used for PTT in biomedical applications.…”
Section: Resultsmentioning
confidence: 99%
“…In data imbalance, the overlap is a common problem faced by researchers and becomes a crucial problem when the data is highly skewed ( Vuttipittayamongkol and Elyan, 2020 ). The problem of overlapping is also supported by the results of the paper, which indicate that there is a very strong relationship between class imbalance and class overlap that influences the performance of the classification ( García et al, 2006 ) ( Almutairi and Janicki, 2020 ) ( Stefanowski, 2013 ). One way to overcome overlap is to select instances to be sampled ( Fernández et al, 2018 ).…”
Section: Related Workmentioning
confidence: 65%
“…Precision is known as Positive Predicted Value (PPV), it represents the percentage of relevant samples that are identified in prediction. It can be computed by divide true (TP) by (TP) plus false positive samples (FP) as shown in Equation 7: precision = XY XYZ\Y (7) Recall is known as True Positive Rate (TPR), it represents the percentage of samples in prediction that are relevant. It can be computed by divide true (TP) by (TP) plus false negative samples (FN) as shown in Equation 8:…”
Section: Resultsmentioning
confidence: 99%
“…Introducing new samples may increase overlapping between classes if the distribution of majority is not taken into consideration, where some of the new samples are produced in the majority class space. This problem known as class overlapping [7]. To avoid data overlapping problems, suitable samples need to be determined as a seed to generate new samples.…”
Section: Introductionmentioning
confidence: 99%