Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
BACKGROUND Chinese biomedical named entity recognition(NER) is a significant subfield of Natural Lan-guage Processing (NLP) focused on identifying names of predefined entity types in text, such as locations, organizations, and person names[1]. As a core task in NLP, the recognition accuracy of named entities significantly influences the performance of downstream tasks. Specifically, NER is typically the first step in tasks[2] such as information retrieval[3], question answering[4], and information extraction[5].Improvements in the recognition accuracy of named entities effectively enhance the performance of these downstream tasks. NER has been extensively studied and experimented with in multiple languages, including Chinese, generating substantial interest among NLP researchers[6][7][8].Most breakthroughs and best performances in Chinese NER have been achieved in the general domain, benefiting from years of research efforts. However, due to the poor generalization ability of deep learning models, the performance of models trained in the general domain is far from ideal when applied to the Chinese biomedical field. Compared with general Chinese texts, NER in the biomedical field faces greater challenges. The primary reasons include the inherent ambiguity of Chinese word boundaries[9] and the lack of medical professional background knowledge in existing models. These issues significantly hinder effective NER in Chinese biomedical texts.Li et al.[10] applied NER techniques from gen-eral Chinese texts to the biomedical field, utilizing transformer-based language models (PLMs) to identify and extract medical text entities. PLMs demonstrate superior performance compared to rule-based and machine learning-based methods [11]. Zheng et al.[12] enhanced PLMs with at-tention mechanisms to improve feature extraction. Liu et al.[13] employed BERT as a pre-training model, combining it with BiLSTM and CRF for NER. Sun et al.[14] integrated character vector features based on the BERT model to extract medical text features. However, these approaches overlook the negative impact of ambiguous Chinese word boundaries on recognition accuracy.In English NER tasks, words are delimited by spaces, providing clear boundaries and eliminating the need for additional word segmentation. In contrast, Chinese text lacks delimiters between char-acters, leading to potential recognition errors due to incorrect boundary identification. Furthermore, with the advancement of deep learning technology, various BERT variants have been developed that surpass the original BERT model in both recognition accuracy and training efficiency. However, most of these BERT variants are trained on general text corpora, lacking domain-specific focus, which significantly limits their performance in the biomedical field. To enhance the model's recognition capabilities, there is a growing need for more specialized and extensive training data. As a response, researchers have started integrating knowledge-enhanced methods by incorporating knowledge graphs into the model's training process, aiming to improve the effectiveness of natural language processing tasks in specific domains.[16][17] However, these approaches typically combine knowledge triples from external knowledge graphs with the original model input sequence to create a Sentence-Tree structure. Since the rep-resentational structure of the Sentence-Tree cannot be directly trained, this leads to inefficiencies in the training process. In light of the challenges associated with named entity recognition in the Chinese biomedical field, this paper proposes the MFKN-RBC model (Multi-Feature Fusion Embedding and Knowledge Enhancement with RoBERTa-wwm-ext-large-BiLSTM-CRF) for Chinese biomedical named entity recognition. This model, grounded in the PLM (Pre-trained Language Model) framework, employs a multi-feature fusion embedding strategy along with knowledge enhance-ment techniques to improve recognition accuracy. By fusing character-level and word-level fea-tures, the multi-feature fusion embedding method more effectively captures the boundary infor-mation of Chinese words, thereby reducing the negative impact of incorrect word boundary recognition on the model's overall accuracy. By combining character-level and word-level fea-tures, the multi-feature fusion embedding method more effectively captures the boundary infor-mation of Chinese words, thereby mitigating the negative impact of incorrect word boundary recognition on the model's overall accuracy. Additionally, the knowledge enhancement method effectively embeds a medical knowledge graph into the model training process, compensating for the decline in recognition accuracy caused by the model's lack of medical domain expertise.The main contributions of this paper are as follows: 1.We propose a multi-feature fusion embedding method for Chinese biomedical NER. This method combines character-level and word-level features. Building on previous work [14], we re-disassemble Chinese characters to generate radical feature vectors for charac-ter-level embedding. For word-level feature embedding, we leverage the advantages of both character granularity and word granularity, improving the SoftLexicon's word frequency compression method with a dynamic weighted calculation approach. This enhancement in-creases word frequency compression efficiency and is more suited to Chinese vocabulary processing. By fusing character-level and word-level features, the model's ability to recognize Chinese word boundaries is significantly enhanced. 2.We introduce an improved knowledge enhancement OBJECTIVE Solve the problems of unclear Chinese word boundaries and insufficient medical expertise in ex-isting Chinese biomedical NER models. METHODS Utilize a model that combines multi-feature fusion embedding and knowledge enhancement.In the embedding part, merge character-level characteristics like glyphs, pinyin, and radicals with word-level ones to handle word boundary issues more effectively.Incorporate a medical knowledge graph into the model for enhanced medical knowledge. RESULTS The model shows a maximum 4.1% increase in F1 score when tested on three datasets. CONCLUSIONS The proposed approach successfully resolves the problems of lack of medical knowledge and unclear word boundaries in Chinese biomedical named entity recognition.
BACKGROUND Chinese biomedical named entity recognition(NER) is a significant subfield of Natural Lan-guage Processing (NLP) focused on identifying names of predefined entity types in text, such as locations, organizations, and person names[1]. As a core task in NLP, the recognition accuracy of named entities significantly influences the performance of downstream tasks. Specifically, NER is typically the first step in tasks[2] such as information retrieval[3], question answering[4], and information extraction[5].Improvements in the recognition accuracy of named entities effectively enhance the performance of these downstream tasks. NER has been extensively studied and experimented with in multiple languages, including Chinese, generating substantial interest among NLP researchers[6][7][8].Most breakthroughs and best performances in Chinese NER have been achieved in the general domain, benefiting from years of research efforts. However, due to the poor generalization ability of deep learning models, the performance of models trained in the general domain is far from ideal when applied to the Chinese biomedical field. Compared with general Chinese texts, NER in the biomedical field faces greater challenges. The primary reasons include the inherent ambiguity of Chinese word boundaries[9] and the lack of medical professional background knowledge in existing models. These issues significantly hinder effective NER in Chinese biomedical texts.Li et al.[10] applied NER techniques from gen-eral Chinese texts to the biomedical field, utilizing transformer-based language models (PLMs) to identify and extract medical text entities. PLMs demonstrate superior performance compared to rule-based and machine learning-based methods [11]. Zheng et al.[12] enhanced PLMs with at-tention mechanisms to improve feature extraction. Liu et al.[13] employed BERT as a pre-training model, combining it with BiLSTM and CRF for NER. Sun et al.[14] integrated character vector features based on the BERT model to extract medical text features. However, these approaches overlook the negative impact of ambiguous Chinese word boundaries on recognition accuracy.In English NER tasks, words are delimited by spaces, providing clear boundaries and eliminating the need for additional word segmentation. In contrast, Chinese text lacks delimiters between char-acters, leading to potential recognition errors due to incorrect boundary identification. Furthermore, with the advancement of deep learning technology, various BERT variants have been developed that surpass the original BERT model in both recognition accuracy and training efficiency. However, most of these BERT variants are trained on general text corpora, lacking domain-specific focus, which significantly limits their performance in the biomedical field. To enhance the model's recognition capabilities, there is a growing need for more specialized and extensive training data. As a response, researchers have started integrating knowledge-enhanced methods by incorporating knowledge graphs into the model's training process, aiming to improve the effectiveness of natural language processing tasks in specific domains.[16][17] However, these approaches typically combine knowledge triples from external knowledge graphs with the original model input sequence to create a Sentence-Tree structure. Since the rep-resentational structure of the Sentence-Tree cannot be directly trained, this leads to inefficiencies in the training process. In light of the challenges associated with named entity recognition in the Chinese biomedical field, this paper proposes the MFKN-RBC model (Multi-Feature Fusion Embedding and Knowledge Enhancement with RoBERTa-wwm-ext-large-BiLSTM-CRF) for Chinese biomedical named entity recognition. This model, grounded in the PLM (Pre-trained Language Model) framework, employs a multi-feature fusion embedding strategy along with knowledge enhance-ment techniques to improve recognition accuracy. By fusing character-level and word-level fea-tures, the multi-feature fusion embedding method more effectively captures the boundary infor-mation of Chinese words, thereby reducing the negative impact of incorrect word boundary recognition on the model's overall accuracy. By combining character-level and word-level fea-tures, the multi-feature fusion embedding method more effectively captures the boundary infor-mation of Chinese words, thereby mitigating the negative impact of incorrect word boundary recognition on the model's overall accuracy. Additionally, the knowledge enhancement method effectively embeds a medical knowledge graph into the model training process, compensating for the decline in recognition accuracy caused by the model's lack of medical domain expertise.The main contributions of this paper are as follows: 1.We propose a multi-feature fusion embedding method for Chinese biomedical NER. This method combines character-level and word-level features. Building on previous work [14], we re-disassemble Chinese characters to generate radical feature vectors for charac-ter-level embedding. For word-level feature embedding, we leverage the advantages of both character granularity and word granularity, improving the SoftLexicon's word frequency compression method with a dynamic weighted calculation approach. This enhancement in-creases word frequency compression efficiency and is more suited to Chinese vocabulary processing. By fusing character-level and word-level features, the model's ability to recognize Chinese word boundaries is significantly enhanced. 2.We introduce an improved knowledge enhancement OBJECTIVE Solve the problems of unclear Chinese word boundaries and insufficient medical expertise in ex-isting Chinese biomedical NER models. METHODS Utilize a model that combines multi-feature fusion embedding and knowledge enhancement.In the embedding part, merge character-level characteristics like glyphs, pinyin, and radicals with word-level ones to handle word boundary issues more effectively.Incorporate a medical knowledge graph into the model for enhanced medical knowledge. RESULTS The model shows a maximum 4.1% increase in F1 score when tested on three datasets. CONCLUSIONS The proposed approach successfully resolves the problems of lack of medical knowledge and unclear word boundaries in Chinese biomedical named entity recognition.
A chatbot is a technological tool that can simulate a discussion between a human and a program application. This technology has been developing rapidly over recent years, and its usage is increasing rapidly in many sectors, especially in education. For this purpose, a systematic literature review was conducted using the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework to analyze the developments and evolutions of this technology in the educational sector during the last 5 years. More precisely, an analysis of the development methods, practices and guidelines for the development of a conversational tutor are examined. The results of this study aim to summarize the gathered knowledge to provide useful information to educators that would like to develop a conversational assistant for their course and to developers that would like to develop chatbot systems in the educational domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.