2019
DOI: 10.48550/arxiv.1906.11455
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(13 citation statements)
references
References 6 publications
0
13
0
Order By: Relevance
“…All modules are trained in an end-to-end paradigm. We use the pkuseg [38] toolkit to segment words. The vocabulary size is limited to 30, 000 for KaMed and MedDialog, and 20, 000 for MedDG.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…All modules are trained in an end-to-end paradigm. We use the pkuseg [38] toolkit to segment words. The vocabulary size is limited to 30, 000 for KaMed and MedDialog, and 20, 000 for MedDG.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Researchers have utilized frequency analysis to leverage aspects of text corpora in order to investigate the context of the text [ 49 ], and we used this technique to understand salient themes in the comments. To convert sentences into word lists, we used PKUSEG [ 50 ], an open-source Chinese word segmentation library developed by Peking University. Furthermore, the Gensim library [ 51 ] was used to find double words or pairs of frequently used words.…”
Section: Methodsmentioning
confidence: 99%
“…To further examine the relationships among words and understand their substantive meanings, we also conducted a semantic network analysis of words [ 41 ]. We employed popular word segmentation libraries in Python: PKUSEG [ 42 ], NLTK toolkit [ 43 ], and NetworkX [ 44 ] to calculate word occurrence and perform the creation of word semantic networks. Inspired by [ 45 ], the top 10 words with the highest frequencies were used.…”
Section: Methodsmentioning
confidence: 99%