2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technolo 2016
DOI: 10.1109/fruct-ispit.2016.7561554
|View full text |Cite
|
Sign up to set email alerts
|

Examining the performance of classification algorithms for imbalanced data sets in web author identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…More studies on author attribution on imbalanced data proposed that uses many short text samples for the minority classes and less and longer text samples for the majority classes [36]. Therefore, it is necessary to take into account the size of the samples used for testing to evaluate the accuracy when performing the identification process and most studies in this area proved that there is no mechanism and solid to provide the appropriate size of the samples for the testing to be applied in biometrics during the identification process.…”
Section: Stylometric Features In Long Textmentioning
confidence: 99%
“…More studies on author attribution on imbalanced data proposed that uses many short text samples for the minority classes and less and longer text samples for the majority classes [36]. Therefore, it is necessary to take into account the size of the samples used for testing to evaluate the accuracy when performing the identification process and most studies in this area proved that there is no mechanism and solid to provide the appropriate size of the samples for the testing to be applied in biometrics during the identification process.…”
Section: Stylometric Features In Long Textmentioning
confidence: 99%
“…Before we begin our investigation of algorithm performance we need to understand the basics of what is classification and regression and why are they used. Classification in machine learning essentially means that to place a new observation in a set of categories already predefined based on the training dataset [3]. So in classification we actually group the output variables into different corresponding classes.…”
Section: Introductionmentioning
confidence: 99%
“…The numerous real-world applications are affected by class imbalance problem wherein the number of samples in one class is very marginal compared to other classes [8][9][10][11]. Issues in fields related to software defect detection [12], threat supervision, medical judgment, web author identification [13] and similar have drawn attention towards concerns of multi-class imbalanced data sets. The representation of boundaries in imbalanced data sets is a difficult concern for learning algorithms.…”
mentioning
confidence: 99%