2023
DOI: 10.1145/3582000
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text Classification

Abstract: Progress in Natural Language Processing (NLP) has been dictated by the rule of more : more data, more computing power, more complexity, best exemplified by Deep Learning Transformers. However, training (or fine-tuning) large dense models for specific applications usually requires significant amounts of computing resources. One way to ameliorate this problem is through data engineering (DE) instead of the algorithmic or hardware perspectives. Our focus here is an under-investigated DE te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(7 citation statements)
references
References 61 publications
0
6
0
1
Order By: Relevance
“…Four representative models are selected for evaluation: TextRNN, Transformer (Cunha et al, 2023), Bert with size of base (Pérez Pozo et al, 2022;Wang et al, 2022;Wang et al, 2024) and LLAMA2 with size of 7B (Touvron et al, 2023). These models, widely acclaimed and adopted, collectively embody distinct stages in the progression of deep learning, presenting a rich diversity.…”
Section: Experiments Settingmentioning
confidence: 99%
“…Four representative models are selected for evaluation: TextRNN, Transformer (Cunha et al, 2023), Bert with size of base (Pérez Pozo et al, 2022;Wang et al, 2022;Wang et al, 2024) and LLAMA2 with size of 7B (Touvron et al, 2023). These models, widely acclaimed and adopted, collectively embody distinct stages in the progression of deep learning, presenting a rich diversity.…”
Section: Experiments Settingmentioning
confidence: 99%
“…The following studies evaluate various IS methods for various domains and contexts. Cunha et al [26] Key evaluation metrics used in their analysis include the reduction (R) mean, Macro Averaged F1, and speedup of training times. This comprehensive approach not only highlights the significant potential of IS in modern text classification tasks but also provides empirical evidence that specific IS methods can effectively streamline the training process without compromising the effectiveness of complex machine learning models.…”
Section: Related Workmentioning
confidence: 99%
“…Self attention allows Transformers to easily transmit information across the input sequences. Inspired by [17,18] we have implemented sklearn's transformer architecture sklearn.preprocessing.FunctionTransformer 8 .…”
Section: Transformersmentioning
confidence: 99%