A Hybrid Method Based on Semi-Supervised Learning for Relation Extraction in Chinese EMRs

Yang, Chunming; Xiao, Deqiang; Luo, Yuanyuan; Li, Bo; Zhao, Xi; Zhang, Hui

doi:10.21203/rs.3.rs-1357125/v1

Cited by 3 publications

(3 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the characteristics of Chinese medical texts, Liu et al [21] proposed a novel BIOH12D1D2 annotation scheme, which transformed the joint extraction task into a tagging problem and solved the problem of overlapping relations. Yang et al [22] designed a hybrid method based on semi-supervised learning to extract the medical entity relations from Chinese EMRs. Lai et al [23] proposed a new framework KECI (Knowledge Enhanced Collective Reasoning), and used external knowledge to extract entities and relations.…”

Section: Related Workmentioning

confidence: 99%

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer

Yang

Cui

et al. 2022

Preprint

View full text Add to dashboard Cite

Background: Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime. Methods: To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-training language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a three-dimensional matrix and score each position in the matrix to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent. Results: In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models. Conclusion: The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.

show abstract

Section: Related Workmentioning

confidence: 99%

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer

Yang

Cui

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Through Natural Language Processing (NLP) technology, medical text can be analyzed and transformed into high-quality knowledge that is convenient for computer processing, thus providing valuable data resources for medical workers and researchers. Relation extraction(RE) refers to the extraction of relational triplet between entity pairs from medical text [ 2 , 3 ]. The triplet is represented as “(head entity, relationship, tail entity)” [ 4 ].…”

Section: Introductionmentioning

confidence: 99%

Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts

Cai

Xiang

et al. 2023

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%.

show abstract

“…Relation extraction(RE) refers to the extraction relational triplet between entity pairs from medical text [1] [2]. The triplet is represented as "(head entity, relationship, tail entity)" [3].…”

Section: Introductionmentioning

confidence: 99%

Subsequence and Distant Supervision based Active Learning for Relation Extraction of Chinese Medical Texts

Cai

Ruan

et al. 2022

Preprint

View full text Add to dashboard Cite

Background: In recent years, relation extraction from unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. Methods: This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a BERT-CRF model for relation extraction in Chinese medical texts. Results: Experimental results test on the CMeIE dataset that it achieves the best results compared to existing methods. And the best F1 values are obtained in different sampling strategies, which are 52.65%, 52.55% and 51.37% respectively.

show abstract

A Hybrid Method Based on Semi-Supervised Learning for Relation Extraction in Chinese EMRs

Cited by 3 publications

References 22 publications

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer

Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts

Subsequence and Distant Supervision based Active Learning for Relation Extraction of Chinese Medical Texts

Contact Info

Product

Resources

About