2004
DOI: 10.1016/j.jbi.2004.08.012
|View full text |Cite
|
Sign up to set email alerts
|

Biomedical named entity recognition using two-phase model based on SVMs

Abstract: Named entity (NE) recognition has become one of the most fundamental tasks in biomedical knowledge acquisition. In this paper, we present a two-phase named entity recognizer based on SVMs, which consists of a boundary identification phase and a semantic classification phase of named entities. When adapting SVMs to named entity recognition, the multi-class problem and the unbalanced class distribution problem become very serious in terms of training cost and performance. We try to solve these problems by separa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
59
0

Year Published

2005
2005
2021
2021

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 103 publications
(60 citation statements)
references
References 8 publications
1
59
0
Order By: Relevance
“…Head nouns were used in [37] and [38]; word lists that are highly associated to classes are extracted as lexicons in [52]; keyword lexicons are statistically computed in [39]; keyword and boundary lists in [50].…”
Section: Jnlpba'04 Corpus and Current Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Head nouns were used in [37] and [38]; word lists that are highly associated to classes are extracted as lexicons in [52]; keyword lexicons are statistically computed in [39]; keyword and boundary lists in [50].…”
Section: Jnlpba'04 Corpus and Current Solutionsmentioning
confidence: 99%
“…Six of the eight systems in the challenge used at least one type of external resources: 1) corpora such as the British National Corpus, the MedLine abstracts and the Penn Treebank for computing frequencies and trigger word extraction; 2) personalized gazetteers extracted from Swissport, LocusLink, Gene Ontology, etc., for keyword identification; 3) specialized taggers to increase the accuracy of certain types of entities (for example, in [52] two gene/protein taggers were used even though the accuracy of protein type extraction was not highly improved by this solution); 4) web searching of entity patterns was exploited by various systems in order to compute lexicons and/or assign weights to words associated to entities. From the challenge, it was not clear which (set of) features, external resources, or classification models really contributed to obtaining the best performances.…”
Section: Jnlpba'04 Corpus and Current Solutionsmentioning
confidence: 99%
“…Learning with the complete training dataset completed in 97 hours on a Xeon quad-processor 3.6 GHz machine. [23] 67.4 / 61.0 / 64.0 Habib [8] 62.3 / 64.5 / 63.4 Park [21] 66.5 / 59.8 / 63.0 Lee [18] 50.8 / 47.6 / 49.1 Baseline [16] 52.6 / 43.6 / 47.7…”
Section: Baseline Experimentsmentioning
confidence: 99%
“…In this work, we do not distinguish between proteins and other named entities(NEs), since it is difficult to make such a distinction. Each word is represented by a binary feature vector, which consists of lexical features, orthographical and morphological features, Part-of-Speech features, and context features(as in [13]). The basic idea is that using these features, the SVM will be able to determine whether the word is a biomedical named entity.…”
Section: Biomedical Named Entity Recognition (Ner)mentioning
confidence: 99%