2023
DOI: 10.3390/a16050236
|View full text |Cite
|
Sign up to set email alerts
|

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

Abstract: Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classifi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(7 citation statements)
references
References 137 publications
1
6
0
Order By: Relevance
“…However from the whole number of models that were tested, the best one was the DL model called BERTweet with an accuracy of 87.7%. This aligns with previous studies that have shown that BERT-based models have a great effectiveness in categorizing any kind of text texts, including tweets [18,31,32].…”
Section: Discussionsupporting
confidence: 91%
“…However from the whole number of models that were tested, the best one was the DL model called BERTweet with an accuracy of 87.7%. This aligns with previous studies that have shown that BERT-based models have a great effectiveness in categorizing any kind of text texts, including tweets [18,31,32].…”
Section: Discussionsupporting
confidence: 91%
“…It is also to be noted that the literature reviews on ensemble methods highlight that ensemble modeling is an acceptable technique for coping with individual classifiers' large variation while minimizing general mistakes [49]. Furthermore, ensemble techniques are reported to be an appropriate method to improve accuracy in text classification tasks, which is what has been observed in its use in AT [50]. Interestingly, a recent study comparing the use of single classifier to an ensemble approach in the domain of mental health suggests that for the prediction of mental health problems, ensemble models demonstrate better prediction results [51].…”
Section: Discussionmentioning
confidence: 87%
“…In this study, we manually coded paragraphs from NICE appraisals for RDTs to train and test three models: Naïve Bayes (21), Lasso regression (22), and Support Vector Machines (SVM) (23). These are frequently used models for text classification, tend to have good classification performance, and are simpler and computationally cheaper to implement than some more sophisticated supervised learning approaches (2427). We used the caret (28), glmnet (29), e1017 (30), and quanteda.textmodels (31) packages to estimate the models.…”
Section: Methodsmentioning
confidence: 99%