Acronyms are heavily used Out of Vocabulary terms in sms, search-queries, social media postings. The performance of text mining algorithms such as Part of Speech Tagging(POS), Named Entity Recognition, Chunking often suffer when they are applied over the noisy text. Text normalization systems are developed to normalize the noisy text. Acronym mapping and expansion has become an important component of the text normalization process. Since manually collecting acronyms and their corresponding expansions from the documents is difficult, automatically building such a dictionary using supervised learning is the need of the hour. In this work, we focus on the acronym search problem: Given acronyms as queries, finding their corresponding expansions in a document.Recent works formulate the given problem as a tokenlevel sequence labelling task and employ Hidden Markov Model, or Conditional Random Fields, to tackle the problem. However, these models do not utilize the segment level information inherent in the expansion. Hence we propose a Semi-Markov Conditional Random Field based approach for the given problem, that gives us power to write more effective features that work on a group of neighbouring tokens together than the features working on individual tokens. We design and implement Semi-Markov Conditional Random Fields to identify the correct acronym expansions for data extracted from Wikipedia and compare the performance with the Conditional Random fields. The experimental results show that Semi-CRF based approach for the given task performs better than the CRF based approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.