2008
DOI: 10.1016/j.jbi.2008.02.003
|View full text |Cite
|
Sign up to set email alerts
|

Word sense disambiguation across two domains: Biomedical literature and clinical notes

Abstract: The aim of this study is to explore the word sense disambiguation (WSD) problem across two biomedical domains-biomedical literature and clinical notes. A supervised machine learning technique was used for the WSD task. One of the challenges addressed is the creation of a suitable clinical corpus with manual sense annotations. This corpus in conjunction with the WSD set from the National Library of Medicine provided the basis for the evaluation of our method across multiple domains and for the comparison of our… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(31 citation statements)
references
References 14 publications
0
30
0
Order By: Relevance
“…Besides the confusions in WSD, there many difficulties in handling these using the supervised and unsupervised methods. Work in [34] determined that supervised methods are the optimal predictors of WSD difficulties, but are limited by their dependence on labelled training data in different domain types such as bionadical [35], [36]. The unsupervised method performed well in some situations and can be applied more broadly [37], [38].…”
Section: Related Workmentioning
confidence: 99%
“…Besides the confusions in WSD, there many difficulties in handling these using the supervised and unsupervised methods. Work in [34] determined that supervised methods are the optimal predictors of WSD difficulties, but are limited by their dependence on labelled training data in different domain types such as bionadical [35], [36]. The unsupervised method performed well in some situations and can be applied more broadly [37], [38].…”
Section: Related Workmentioning
confidence: 99%
“…The primary feature was Unified Medical Language System (UMLS) concept unique identifiers (CUIs), which represented clinical concepts in the patient's EHR notes. The CUIs were extracted using Apache cTAKES [36] and converted into concept vectors. cTAKES implements a full stack of NLP modules including part-of-speech tagger, parsers, relation discovery modules, as well as attribute identification modules (such as negation, uncertainty, subject).…”
Section: Automated Algorithms For Seco Detectionmentioning
confidence: 99%
“…Most of the methods for biomedical entity name recognition, classification, or disambiguation can be roughly divided into three categories: (i) supervised and machine-learning-based techniques, (ii) statistical and corpus-based techniques, and (iii) syntactic and rule-based techniques [911]. Moreover, the bioinformatics literature shows that biomedical WSD has been a quite active area of research with a number of approaches proposed and applied to biomedical data [1, 2, 4, 8, 12, 13]. …”
Section: Related Workmentioning
confidence: 99%
“…The supervised methods rely on training and learning phases that require a dataset or corpus containing manually disambiguated instances to be used to train the system [5, 6]. The unsupervised methods, on the other hand, are based on knowledge sources like ontology, for example, from UMLS, or text corpora [2, 4, 7, 8]. Our approach in this paper is a supervised approach.…”
Section: Introductionmentioning
confidence: 99%