We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus -the GENIA corpus -tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Experiments on the GENIA corpus show that our class splitting technique not only enables the training with the GENIA corpus but also improves the accuracy. The proposed new features also contribute to improve the accuracy. We compare our SVMbased recognition system with a system using Maximum Entropy tagging method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.