The Natural Language Toolkit (NLTK) is widely used for teaching natural language processing to students majoring in linguistics or computer science. This paper describes the design of NLTK, and reports on how it has been used effectively in classes that involve different mixes of linguistics and computer science students. We focus on three key issues: getting started with a course, delivering interactive demonstrations in the classroom, and organizing assignments and projects. In each case, we report on practical experience and make recommendations on how to use NLTK to maximum effect.
In this paper I shall present a truth conditional semantics for simple adjectival comparatives. The leading idea is that the semantics should hug closely to the surface syntax of the construction and should not depend on postulating degrees or extents as primitive entities. The remainder of this introductory section will raise some objections to existing accounts of the semantics of comparative constructions. In Section 2 it will be argued that an adequate analysis of compared adjectives must take into account the vagueness of positive adjectives, and that the latter can be captured by interpreting degree adjectives as partial functions. Section 3 contains an analysis of degree modifiers, and in Section 4 this analysis provides the basis for the interpretation of comparatives. Section 5 spells out some consequences of the theory, while the last section contains a summary and conclusion.
Although text mining shows considerable promise as a tool for supporting the curation of biomedical text, there is little concrete evidence as to its effectiveness. We report on three experiments measuring the extent to which curation can be speeded up with assistance from Natural Language Processing (NLP), together with subjective feedback from curators on the usability of a curation tool that integrates NLP hypotheses for protein-protein interactions (PPIs). In our curation scenario, we found that a maximum speed-up of 1/3 in curation time can be expected if NLP output is perfectly accurate. The preference of one curator for consistent NLP output and output with high recall needs to be confirmed in a larger study with several curators.
Spoken dialogue systems would be more acceptable if they were able to produce backchannel continuers such as mm-hmm in naturalistic locations during the user's utterances. Using the HCRC Map Task Corpus as our data source, we describe models for predicting these locations using only limited processing and features of the user's speech that are commonly available, and which therefore could be used as a lowcost improvement for current systems. The baseline model inserts continuers after a predetermined number of words. One further model correlates back-channel continuers with pause duration, while a second predicts their occurrence using trigram POS frequencies. Combining these two models gives the best results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.