2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD) 2022
DOI: 10.1109/icaibd55127.2022.9820466
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison Study of Pre-trained Language Models for Chinese Legal Document Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…Of these, 570,000 were found useable after cleaning (duplicate data, ads, polls, image sharing, @someone, retweets, and other invalid texts were eliminated). Emojis were transformed into corresponding text using 'emojiswitch' library [96]. Word2Vec (a technique for natural language processing) was used to transform the text into numerical models so that a sentiment classification model could process the text data.…”
Section: Methodsmentioning
confidence: 99%
“…Of these, 570,000 were found useable after cleaning (duplicate data, ads, polls, image sharing, @someone, retweets, and other invalid texts were eliminated). Emojis were transformed into corresponding text using 'emojiswitch' library [96]. Word2Vec (a technique for natural language processing) was used to transform the text into numerical models so that a sentiment classification model could process the text data.…”
Section: Methodsmentioning
confidence: 99%
“…The classification of legal documents is the basis of legal artificial intelligence tasks and has important research value. Qin et al (2022) found that, compared with machine learning models based on feature engineering and traditional convolutional neural network or recurrent neural network models in the field of NLP, language models pre-trained on an English corpus achieve good performance in classification tasks. Several different pre-trained language models have been studied, and the Chinese legal corpus has been used for pretraining.…”
Section: Fine-tuning Large Language Models For the Legal Domainmentioning
confidence: 99%