The construction of terminological products is important to the organization and spreading of knowledge. This task can be leveraged by the automatic extraction of terms, which has been considered a Natural Language Processing problem. In this paper, the interaction between the statistical approach to term extraction and the process of stopword removal is investigated. Experiments conducted on two corpora show that stopword removal improves performance when extracting bigram terms, no matter if the removal is done before or after the application of a statistical metric. As a result of this investigation, it is possible to recommend more appropriate statistical metrics for the case where it is possible to remove stopwords and for the case that this removal cannot be done. 2009 Seventh Brazilian Symposium in Information and Human Language Technology 978-0-7695-3945-4/09 $26.00
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.