Summary
Entity recognition plays an important role in building the electronic medical records (EMRs) based medical knowledge graph, which is significant for building Clinical decision support (CDS) system. Cross‐disease clinical documents are context‐related and have different interrelated semantic structures, which bring challenges for entity recognition using traditional methods. In order to solve these problems, this paper proposes a co‐training based entity recognition approach for cross‐disease clinical documents. In this model, we first build partial annotation corpus of the single disease using dependency syntax analysis and the medical statement rule unifies. Then, according to the partial annotation corpus of different diseases, the sentence level features are extracted through the Bi‐LSTM layer with memory unit and CRF methods, which optimize the whole sequence and improve the combination probability of sequence labels. Finally, the results with higher confidence are selected by cross feedback to label the corpus, which enlarges the size of corpus and improves the accuracy of the document entity recognition. The experiment result proves the availability and high efficiency of our method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.