We present an approach to pronoun resolution based on syntactic paths. Through a simple bootstrapping procedure, we learn the likelihood of coreference between a pronoun and a candidate noun based on the path in the parse tree between the two entities. This path information enables us to handle previously challenging resolution instances, and also robustly addresses traditional syntactic coreference constraints. Highly coreferent paths also allow mining of precise probabilistic gender/number information. We combine statistical knowledge with well known features in a Support Vector Machine pronoun resolution classifier. Significant gains in performance are observed on several datasets.
We present a discriminative method for learning selectional preferences from unlabeled text. Positive examples are taken from observed predicate-argument pairs, while negatives are constructed from unobserved combinations. We train a Support Vector Machine classifier to distinguish the positive from the negative instances. We show how to partition the examples for efficient training with 57 thousand features and 6.5 million training instances. The model outperforms other recent approaches, achieving excellent correlation with human plausibility judgments. Compared to Mutual Information, it identifies 66% more verb-object pairs in unseen text, and resolves 37% more pronouns correctly in a pronoun resolution experiment.
Abstract. We present Nada: the Non-Anaphoric Detection Algorithm. Nada is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, Nada uses very large-scale web N-gram features, but Nada makes these features practical by compressing the N-gram counts so they can fit into computer memory. Nada therefore operates as a fast, stand-alone system. Nada also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. Nada very substantially outperforms other state-of-the-art systems in nonreferential detection accuracy.
We propose an unsupervised Expectation Maximization approach to pronoun resolution. The system learns from a fixed list of potential antecedents for each pronoun. We show that unsupervised learning is possible in this context, as the performance of our system is comparable to supervised methods. Our results indicate that a probabilistic gender/number model, determined automatically from unlabeled text, is a powerful feature for this task.
Motivated by work predicting coarsegrained author categories in social media, such as gender or political preference, we explore whether Twitter contains information to support the prediction of finegrained categories, or social roles. We find that the simple self-identification pattern "I am a " supports significantly richer classification than previously explored, successfully retrieving a variety of fine-grained roles. For a given role (e.g., writer), we can further identify characteristic attributes using a simple possessive construction (e.g., writer's ). Tweets that incorporate the attribute terms in first person possessives (my ) are confirmed to be an indicator that the author holds the associated social role.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.