2022
DOI: 10.1109/access.2022.3225659
|View full text |Cite
|
Sign up to set email alerts
|

Word-Level and Pinyin-Level Based Chinese Short Text Classification

Abstract: Short text classification is an important branch of Natural Language Processing. Although CNN and RNN have achieved satisfactory results in the text classification tasks, they are difficult to apply to the Chinese short text classification because of the data sparsity and the homophonic typos problems of them. To solve the above problems, word-level and Pinyin-level based Chinese short text classification model is constructed. Since homophones have the same Pinyin, the addition of Pinyin-level features can sol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 56 publications
0
1
0
Order By: Relevance
“…Therefore, the source of our corpus is determined to be the Weibo platform. We chose simplifyweibo_4_moods (https://github.com/SophonPlus/ChineseNlpCorpus, accessed on 2 February 2019) as the base corpus, which is commonly used for emotion classification tasks [29][30][31][32]. The overall pre-processing process is as follows.…”
Section: Corpus Selectionmentioning
confidence: 99%
“…Therefore, the source of our corpus is determined to be the Weibo platform. We chose simplifyweibo_4_moods (https://github.com/SophonPlus/ChineseNlpCorpus, accessed on 2 February 2019) as the base corpus, which is commonly used for emotion classification tasks [29][30][31][32]. The overall pre-processing process is as follows.…”
Section: Corpus Selectionmentioning
confidence: 99%
“…Convolutional neural networks CNNs were initially used in the field of image processing [7] and were later applied in the direction of word processing. Recurrent neural networks RNNs have made great progress in text sequence processing by considering not only the current input [8][9][10] but also the previous input.…”
Section: Introductionmentioning
confidence: 99%