Two-phase biomedical NE recognition based on SVMs

Lee, Ki-Joong; Hwang, Young-Sook; Rim, Hae-Chang

doi:10.3115/1118958.1118963

Cited by 74 publications

(60 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In-context features dictionary is similar to the work of Lee et al [6]. The most right 3 words from name in the training data are collected as candidates.…”

Section: Construction Of Dictionariesmentioning

confidence: 99%

“…Nowadays, Named Entity Recognition (NER) is proved to be fundamental in information extraction and understanding in biomedical domain. Based on the method, the NER system can be roughly split into three categorizes: rule-based methods [1][2], dictionary-based methods [3], and statistical-based methods [4][5][6][7], although there are also combination of dictionary-based and rule-based method [8].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Various Features with Integrated Strategies for Protein Name Classification

Ongkowijaya

Ding

Zhu

2005

Parallel and Distributed Processing and Applications - ISPA 2005 Workshops

View full text Add to dashboard Cite

Abstract. Classification task is an integral part of named entity recognition system to classify a recognized named entity to its corresponding class. This task has not received much attention in the biomedical domain, due to the lack of awareness to differentiate feature sources and strategies in previous studies.In this research, we analyze different sources and strategies of protein name classification, and developed integrated strategies that incorporate advantages from rule-based, dictionary-based and statistical-based method. In rule-based method, terms and knowledge of protein nomenclature that provide strong cue for protein name are used. In dictionary-based method, a set of rules for curating protein name dictionary are used. These terms and dictionaries are combined with our developed features into a statistical-based classifier. Our developed features are comprised of word shape features and unigram & bi-gram features. Our various information sources and integrated strategies are able to achieve state-of-the-art performance to classify protein and non-protein names.

show abstract

“…In-context features dictionary is similar to the work of Lee et al [6]. The most right 3 words from name in the training data are collected as candidates.…”

Section: Construction Of Dictionariesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Various Features with Integrated Strategies for Protein Name Classification

Ongkowijaya

Ding

Zhu

2005

Parallel and Distributed Processing and Applications - ISPA 2005 Workshops

View full text Add to dashboard Cite

show abstract

“…General medical term was trained with UMLS meta-thesaurus [12] and the biological entity and its interaction was trained with GENIA [13] corpus. The underlying NLP approaches for named entity recognition are based on the system of Hwang et al [14] and Lee et al [15] with collaborations. More detailed descriptions of language processing are elucidated in [16].…”

Section: Interaction Extractionmentioning

confidence: 99%

PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining

Eom

Zhang

2004

Artificial Intelligence: Methodology, Systems, and Applications

View full text Add to dashboard Cite

Abstract. PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature is introduced. PubMiner utilize natural language processing and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature data. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language analysis. The extracted interactions are further analyzed with a set of features of each entity which were constructed from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The evaluation of system performance proceeded with the protein interaction data of S.cerevisiae (bakers yeast) from MIPS and SGD.

show abstract

“…); and in machine translation or cross-language information retrieval, special transliteration processes can be applied to entity names across languages with different alphabets, 4 provided that the names have been identified. Although ners that employ mostly hand-crafted rules 5,6 may perform very well, ners that use statistical and machine learning techniques, including Hidden Markov or Maximum Entropy Models, 7,8,9,10 decision tree learning and/or boosting, 11,12,13 and Support Vector Machines, 14,15 usually outperform them and they are easier to port to new text genres (e.g., biomedical, instead of news articles), where new name categories (e.g., protein names) may also need to be supported. However, supervised statistical and machine learning-based ners still require a tedious manual annotation phase, during which humans must tag occurrences of entity names in a training corpus.…”

Section: Introductionmentioning

confidence: 99%

“…Our two passes are also different from the approach whereby a first phase identifies all entity names and a second one categorizes them. 15 Furthermore, unlike the system of Shen et al, our ensemble acts as a single classifier with non-overlapping categories. In active learning, we select training examples for each pass by considering the distances from the hyperplanes of both svms of that pass, much as in Vlachos.…”

mentioning

confidence: 99%

Named Entity Recognition in Greek Texts With an Ensemble of SVMS and Active Learning

Lucarelli

Vasilakos

Androutsopoulos

2007

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

We present a freely available named-entity recognizer for Greek texts that identifies temporal expressions, person, and organization names. For temporal expressions, it relies on semi-automatically produced patterns. For person and organization names, it employs an ensemble of Support Vector Machines that scan the input text in two passes. The ensemble is trained using active learning, whereby the system itself proposes candidate training instances to be annotated by a human during training. The recognizer was evaluated on both a general collection of newspaper articles and a more focussed, in terms of topics, collection of financial articles.

show abstract

Two-phase biomedical NE recognition based on SVMs

Cited by 74 publications

References 8 publications

Various Features with Integrated Strategies for Protein Name Classification

Various Features with Integrated Strategies for Protein Name Classification

PubMiner: Machine Learning-Based Text Mining System for Biomedical Information Mining

Named Entity Recognition in Greek Texts With an Ensemble of SVMS and Active Learning

Contact Info

Product

Resources

About