2021
DOI: 10.1007/s42979-021-00480-4
|View full text |Cite
|
Sign up to set email alerts
|

Simple Baseline Machine Learning Text Classifiers for Small Datasets

Abstract: Text classification is important to better understand online media. A major problem for creating accurate text classifiers using machine learning is small training sets due to the cost of annotating them. On this basis, we investigated how SVM and NBSVM text classifiers should be designed to achieve high accuracy and how the training sets should be sized to efficiently use annotation labor. We used a four-way repeated-measures full-factorial design of 32 design factor combinations. For each design factor combi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…Conventionally, the application of ML techniques, including NLP to text-based data, relied on large data sets for the algorithm to elicit meaningful themes. More recently, though, ML-based analysis of qualitative data yielded interesting results even on smaller data sets ( 60 ). This considered, the current study aimed to merge the potential of analysing data thematically and using NLP.…”
Section: Methodsmentioning
confidence: 99%
“…Conventionally, the application of ML techniques, including NLP to text-based data, relied on large data sets for the algorithm to elicit meaningful themes. More recently, though, ML-based analysis of qualitative data yielded interesting results even on smaller data sets ( 60 ). This considered, the current study aimed to merge the potential of analysing data thematically and using NLP.…”
Section: Methodsmentioning
confidence: 99%
“…For all general domain tasks, we use results from BERT-ITPT-FiT [7], which optimises BERT for text classification, on four common benchmarks. For SVM results, we report the score of the best performing variant from a large scale comparison [8].…”
Section: General Domainmentioning
confidence: 99%
“…However, machine learning techniques conventionally rely on large data sets to enable the algorithm to elicit meaningful themes. More recent research efforts have turned to the performance of machine learning approaches with smaller data sets and have shown promising results with as few as 300 examples in supervised machine learning (Riekert, Riekert and Klein, 2021). However, the majority of adopt an exploratory approach which is handled via unsupervised ML algorithms.…”
Section: Introductionmentioning
confidence: 99%