A semi-supervised approach for extracting TCM clinical terms based on feature words

Liu, Liangliang; Wu, Xiaojing; Liu, Hui; Cao, Xinyu; Wang, Haitao; Zhou, Hongwei; Xie, Qiang

doi:10.1186/s12911-020-1108-1

Cited by 10 publications

(6 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this review, some studies reported that the application or combination of BERT could significantly improve the result of entity recognition or relation extraction [18,58,59,61]. In the last decade, the proposed deep learning models for IE tasks include BERT-convolutional neural network (CNN) [18], convolutional neural network with segment attention mechanism (SEGATT-CNN) [63], K-nearest neighbor (KNN) [53], long short-term memory (LSTM) [52,53], bidirectional long short-term memory (BiLSTM) [17], structural BiLSTM [31], LSTM-CRF [54,58], BiLSTM-CRF [22,28,55,58,62,64], BERT-BiLSTM-CRF [59,61,66], graph neural networks [21], and a nested NER model based on LSTM-CRF [29]. Among the above-mentioned models, the "BiLSTM-CRF" and "BERT-BiLSTM-CRF" have become popular deep learning models because of their good extraction performance: the BiLSTM model can capture more context information than the LSTM model.…”

Section: Deep Learning Modelsmentioning

confidence: 99%

“…With this background, more approaches were explored to extract information from different types of TCM text data, and IE from TCM texts has shown encouraging improvements accordingly [17,18]. Although previous research has summarized some IE work in TCM [19][20][21], the new advanced technologies and emerging methods need to be further summarized and synthesized, for example, improved deep learning approaches and more types of extracted information [22,23]. In this study, we searched four literature databases for articles published from 2010 to 2021 that focused on the use of NLP methods to extract information from unstructured TCM text data.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021

Zhang

Huang

Wang

et al. 2022

Evidence-Based Complementary and Alternative Medicine

View full text Add to dashboard Cite

Background. The practice of traditional Chinese medicine (TCM) began several thousand years ago, and the knowledge of practitioners is recorded in paper and electronic versions of case notes, manuscripts, and books in multiple languages. Developing a method of information extraction (IE) from these sources to generate a cohesive data set would be a great contribution to the medical field. The goal of this study was to perform a systematic review of the status of IE from TCM sources over the last 10 years. Methods. We conducted a search of four literature databases for articles published from 2010 to 2021 that focused on the use of natural language processing (NLP) methods to extract information from unstructured TCM text data. Two reviewers and one adjudicator contributed to article search, article selection, data extraction, and synthesis processes. Results. We retrieved 1234 records, 49 of which met our inclusion criteria. We used the articles to (i) assess the key tasks of IE in the TCM domain, (ii) summarize the challenges to extracting information from TCM text data, and (iii) identify effective frameworks, models, and key findings of TCM IE through classification. Conclusions. Our analysis showed that IE from TCM text data has improved over the past decade. However, the extraction of TCM text still faces some challenges involving the lack of gold standard corpora, nonstandardized expressions, and multiple types of relations. In the future, IE work should be promoted by extracting more existing entities and relations, constructing gold standard data sets, and exploring IE methods based on a small amount of labeled data. Furthermore, fine-grained and interpretable IE technologies are necessary for further exploration.

show abstract

Section: Deep Learning Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021

Zhang

Huang

Wang

et al. 2022

Evidence-Based Complementary and Alternative Medicine

View full text Add to dashboard Cite

show abstract

“…In recent years, many disciplines witness fast growth in exploiting machine learning and text mining technologies to discover knowledge hidden in a massive volume of data. Similarly, some efforts have raised in TCM which utilize machine learning and text mining technologies for discovering knowledge from prescriptions and clinical records, such as treatment rules mining [10,11], medical term extraction [12][13][14], syndrome differentiation [15], knowledge graph construction [16] and fine-grained entity corpus construction [17]. However, majority efforts in these studies are devoted to structured data or unstructured textual data written in modern Chinese language, in spite of the importance of ancient literature for modern TCM research and clinical practice, as mentioned in Section Background.…”

Section: Related Workmentioning

confidence: 99%

“…For the sentence-level labelling we employ the traditional metrics for the evaluation, including precision, recall, F 1 -value and accuracy. Specifically, for a tag i from { B , I , O }, the precision P i , recall R i , and F 1 -value F 1i are defined respectively in the formulas ( 12), ( 13) and (14).…”

Section: Datasets and Evaluation Metricsmentioning

confidence: 99%

Extracting Disease-Specific Clinical Experiences from Ancient Literature of Traditional Chinese Medicine with Deep Learning

Lu¹,

Chen²,

Zhang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Background: Ancient literature of Traditional Chinese Medicine (TCM) contains massive clinical experiences which are important ingredient of TCM knowledge and valuable for TCM clinical practice of nowadays. However, it is difficult for TCM professionals to acquire such valuable experiences due to their massive volume and broad occurrence in the literature. Furthermore, different characteristics of ancient Chinese language from the modern one lead to additional challenges for analyzing the literature, regardless of how to perform the analyzing, manually or automatically with a software toolkit. Methods: In order to overcome the aforementioned challenges, we formalize a novel information extraction task for ancient literature of TCM, and the entities to be extracted are Disease-Specific Clinical Experiences (DSCEs) occurring in the literature. For the purpose, we have collected two corpora from ancient literature of TCM and annotated them manually with DSCEs occurrence information for the diseases pregnant abdominalgia and colporrhagia (妊娠腹痛及下血) and jaundice (黄疸) respectively. We further propose a deep learning and CRF-based algorithmic framework with character encoding of ancient Chinese, thus avoiding the special difficulty in word segmentation for ancient Chinese texts. We investigate the framework with different methods for contextual encoding of characters in a sentence, including CNN, Bi-LSTM and BERT, and diverse approaches to aggregate contextual information of characters into a sentence encoding, such as max-pooling and attention mechanism. After that all the encoded sentences in a section of the literature are passed through a Bi-LSTM-based sequence labelling model with CRF inference on its top to obtain an optimal label sequence for the sentences in the section. Results: We conduct a series of experiments on the two corpora to verify the effectiveness of our framework for the task, and evaluate its effectiveness with different metrics in two granularities of labelling, namely accuracy/F1-value in sentence-level labelling and precision/recall/F1-value in correct recognition of the whole DSCEs. Conclusion: The experimental results demonstrate that the deep learning and CRF-based framework with character encoding of ancient Chinese could achieve an accuracy of 80.40%±1.64% and an F1-value of 76.73%±1.59% for the sentence labelling, while for recognition of the whole DSCEs, it is able to obtain the recall of 44.97%±2.16% and the precision of 51.13%±2.64%, meaning that the framework is a promising baseline for further development of the novel information extraction task for TCM.

show abstract

“…Liu et al combined the BiLSTM-CRF model with semisupervised learning to reduce the cost of manual annotation and leveraged extraction results. e proposed method is of practical utility in improving the extraction of five types of TCM clinical terms, including traditional Chinese medicine, symptoms, patterns, diseases, and formulas [22]. Zhang et al worked on building a fine-grained entity annotation corpus of TCM clinical records [13].…”

Section: Introductionmentioning

confidence: 99%

TCMNER and PubMed: A Novel Chinese Character-Level-Based Model and a Dataset for TCM Named Entity Recognition

Liu

Luo

Zheng

et al. 2021

Journal of Healthcare Engineering

View full text Add to dashboard Cite

Intelligent traditional Chinese medicine (TCM) has become a popular research field by means of prospering of deep learning technology. Important achievements have been made in such representative tasks as automatic diagnosis of TCM syndromes and diseases and generation of TCM herbal prescriptions. However, one unavoidable issue that still hinders its progress is the lack of labeled samples, i.e., the TCM medical records. As an efficient tool, the named entity recognition (NER) models trained on various TCM resources can effectively alleviate this problem and continuously increase the labeled TCM samples. In this work, on the basis of in-depth analysis, we argue that the performance of the TCM named entity recognition model can be better by using the character-level representation and tagging and propose a novel word-character integrated self-attention module. With the help of TCM doctors and experts, we define 5 classes of TCM named entities and construct a comprehensive NER dataset containing the standard content of the publications and the clinical medical records. The experimental results on this dataset demonstrate the effectiveness of the proposed module.

show abstract

A semi-supervised approach for extracting TCM clinical terms based on feature words

Cited by 10 publications

References 21 publications

Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021

Information Extraction from the Text Data on Traditional Chinese Medicine: A Review on Tasks, Challenges, and Methods from 2010 to 2021

Extracting Disease-Specific Clinical Experiences from Ancient Literature of Traditional Chinese Medicine with Deep Learning

TCMNER and PubMed: A Novel Chinese Character-Level-Based Model and a Dataset for TCM Named Entity Recognition

Contact Info

Product

Resources

About