PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Luo, Ruixuan; Xu, Jingjing; Zhang, Yi; Ren, Xuancheng; Sun, Xu

doi:10.48550/arxiv.1906.11455

Cited by 32 publications

(13 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All modules are trained in an end-to-end paradigm. We use the pkuseg [38] toolkit to segment words. The vocabulary size is limited to 30, 000 for KaMed and MedDialog, and 20, 000 for MedDG.…”

Section: Implementation Detailsmentioning

confidence: 99%

Semi-Supervised Variational Reasoning for Medical Dialogue Generation

Li¹,

Zhang²,

Ren

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Medical dialogue generation aims to provide automatic and accurate responses to assist physicians to obtain diagnosis and treatment suggestions in an efficient manner. In medical dialogues two key characteristics are relevant for response generation: patient states (such as symptoms, medication) and physician actions (such as diagnosis, treatments). In medical scenarios large-scale human annotations are usually not available, due to the high costs and privacy requirements. Hence, current approaches to medical dialogue generation typically do not explicitly account for patient states and physician actions, and focus on implicit representation instead.We propose an end-to-end variational reasoning approach to medical dialogue generation. To be able to deal with a limited amount of labeled data, we introduce both patient state and physician action as latent variables with categorical priors for explicit patient state tracking and physician policy learning, respectively. We propose a variational Bayesian generative approach to approximate posterior distributions over patient states and physician actions. We use an efficient stochastic gradient variational Bayes estimator to optimize the derived evidence lower bound, where a 2stage collapsed inference method is proposed to reduce the bias during model training. A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability. We conduct experiments on three datasets collected from medical platforms. Our experimental results show that the proposed method outperforms state-of-the-art baselines in terms of objective and subjective evaluation metrics. Our experiments also indicate that our proposed semi-supervised reasoning method achieves a comparable performance as state-of-the-art fully supervised learning baselines for physician policy learning.

show abstract

Section: Implementation Detailsmentioning

confidence: 99%

Semi-Supervised Variational Reasoning for Medical Dialogue Generation

Li¹,

Zhang²,

Ren

et al. 2021

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Researchers have utilized frequency analysis to leverage aspects of text corpora in order to investigate the context of the text [ 49 ], and we used this technique to understand salient themes in the comments. To convert sentences into word lists, we used PKUSEG [ 50 ], an open-source Chinese word segmentation library developed by Peking University. Furthermore, the Gensim library [ 51 ] was used to find double words or pairs of frequently used words.…”

Section: Methodsmentioning

confidence: 99%

Changes in Doctor–Patient Relationships in China during COVID-19: A Text Mining Analysis

Xiao

Wong

2022

IJERPH

View full text Add to dashboard Cite

Doctor–patient relationships (DPRs) in China have been straining. With the emergence of the COVID-19 pandemic, the relationships and interactions between patients and doctors are changing. This study investigated how patients’ attitudes toward physicians changed during the pandemic and what factors were associated with these changes, leading to insights for improving management in the healthcare sector. This paper collected 58,600 comments regarding Chinese doctors from three regions from the online health platform Good Doctors Online (haodf.com, accessed on 13 October 2022). These comments were analyzed using text mining techniques, such as sentiment and word frequency analyses. The results showed improvements in DPRs after the pandemic, and the degree of improvement was related to the extent to which a location was affected. The findings also suggest that administrative services in the healthcare sector need further improvement. Based on these results, we summarize relevant recommendations at the end of this paper.

show abstract

“…To further examine the relationships among words and understand their substantive meanings, we also conducted a semantic network analysis of words [ 41 ]. We employed popular word segmentation libraries in Python: PKUSEG [ 42 ], NLTK toolkit [ 43 ], and NetworkX [ 44 ] to calculate word occurrence and perform the creation of word semantic networks. Inspired by [ 45 ], the top 10 words with the highest frequencies were used.…”

Section: Methodsmentioning

confidence: 99%

Social Media Engagement in Two Governmental Schemes during the COVID-19 Pandemic in Macao

Jiang

et al. 2022

IJERPH

View full text Add to dashboard Cite

Social media engagement is a vehicle for effective communication and engagement between governments and individuals, especially in crises such as the COVID-19 pandemic. Additionally, it can be used to communicate resilience measures and receive feedback. This research aims to investigate public social media engagement with resilience measures related to COVID-19 in Macao. We examined 1107 posts and 791 comments about the government’s face mask supply and consumption voucher schemes on Facebook. Using the Crisis Lifecycle model, we partitioned the data and analyzed the content and engagement of related posts, as well as the word semantics in user comments. Our findings show that social media engagement in these resilience measures is high and positive in the early stages of the pandemic, suggesting social media’s potential in mobilizing society, preserving social resilience, and serving as a two-way communication tool in public health emergencies.

show abstract

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

Cited by 32 publications

References 6 publications

Semi-Supervised Variational Reasoning for Medical Dialogue Generation

Semi-Supervised Variational Reasoning for Medical Dialogue Generation

Changes in Doctor–Patient Relationships in China during COVID-19: A Text Mining Analysis

Social Media Engagement in Two Governmental Schemes during the COVID-19 Pandemic in Macao

Contact Info

Product

Resources

About