2020
DOI: 10.1007/978-3-030-57321-8_21
|View full text |Cite
|
Sign up to set email alerts
|

Improving Short Text Classification Through Global Augmentation Methods

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
29
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 68 publications
(39 citation statements)
references
References 22 publications
1
29
0
Order By: Relevance
“…They also include data noising techniques, such as altering words in the input of self-encoder networks in order to generate a different sentence (Xie et al 2017;Zolna et al 2017;Li et al 2018), or introducing noise on the word-embedding level. These methods were analyzed in (Marivate and Sefara 2019). Although a viable option when no access to a formal synonym model exists, they require abundant training data.…”
Section: Related Workmentioning
confidence: 99%
“…They also include data noising techniques, such as altering words in the input of self-encoder networks in order to generate a different sentence (Xie et al 2017;Zolna et al 2017;Li et al 2018), or introducing noise on the word-embedding level. These methods were analyzed in (Marivate and Sefara 2019). Although a viable option when no access to a formal synonym model exists, they require abundant training data.…”
Section: Related Workmentioning
confidence: 99%
“…Hence, textual data can be easily augmented by replacing a fraction of the original text with the nearest neighbours of the chosen words. This approach requires either pre-trained word embedding models for the language in question or enough data from the target application to build the embedding model [16]. Thus, this approach does not require access to a dictionary or thesaurus for a language to find synonyms [16].…”
Section: Semantic Similarity Augmentationmentioning
confidence: 99%
“…This approach requires either pre-trained word embedding models for the language in question or enough data from the target application to build the embedding model [16]. Thus, this approach does not require access to a dictionary or thesaurus for a language to find synonyms [16]. This can be advantageous for languages where such resources are more difficult to obtain, but there is enough unsupervised text data to be able to build the embedding models [16].…”
Section: Semantic Similarity Augmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Back-translation (Sennrich et al, 2016;Yu et al, 2018) is also a major way for textual DA, which uses machine translation model to translate English sentences into another language (e.g., French), and back into English. Besides, data noising techniques (Xie et al, 2017;Marivate and Sefara, 2019) and paraphrasing (Kumar et al, 2019) are proposed to generate new textual samples. All the methods mentioned above usually generate individual sentences separately.…”
Section: Related Workmentioning
confidence: 99%