2011
DOI: 10.1007/978-3-642-25917-3_2
|View full text |Cite
|
Sign up to set email alerts
|

NADA: A Robust System for Non-referential Pronoun Detection

Abstract: Abstract. We present Nada: the Non-Anaphoric Detection Algorithm. Nada is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, Nada uses very large-scale web N-gram features, but Nada makes these features practical by compressing the N-gram counts so they can fit into computer memory. Nada therefore operates as a fast, stand-alone system. Nada also improves over previous web-scale s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(27 citation statements)
references
References 15 publications
1
26
0
Order By: Relevance
“…More recently, Bergsma and Yarowsky (2011) develop the NADA system, which improves on Bergsma et al (2008) by incorporating lexical features. The lexical features indicate the presence or absence of some strings at specific positions around the pronoun: three-grams to five-grams spanning the pronoun; two tokens before the pronoun to five tokens after the pronoun with their positions; any token within twenty tokens to the right of the pronoun; and any token within ten tokens to the left of the pronoun that is a named entity or belongs to the following list: that, this, and, said, says, it, It, its, itself.…”
Section: Non-referential Mentionsmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, Bergsma and Yarowsky (2011) develop the NADA system, which improves on Bergsma et al (2008) by incorporating lexical features. The lexical features indicate the presence or absence of some strings at specific positions around the pronoun: three-grams to five-grams spanning the pronoun; two tokens before the pronoun to five tokens after the pronoun with their positions; any token within twenty tokens to the right of the pronoun; and any token within ten tokens to the left of the pronoun that is a named entity or belongs to the following list: that, this, and, said, says, it, It, its, itself.…”
Section: Non-referential Mentionsmentioning
confidence: 99%
“…The propensity toward singletons also highlights the relevance of detecting singletons for a coreference system. Following Bergsma and Yarowsky (2011), we use a logistic regression model, which has been shown to perform well on a range of NLP tasks. We fit the logistic regression model in R (R Development Core Team, 2013) on the training data, coding singletons as '0' and coreferent mentions as '1'.…”
Section: Predicting Lifespans With Linguistic Featuresmentioning
confidence: 99%
“…The automatic detection of instances of pleonastic 'it', on the other hand, has been addressed by the non-referential 'it' detector NADA (Bergsma and Yarowsky, 2011), and also in the context of several coreference resolution systems, including the Stanford sieve-based coreference resolution system (Lee et al, 2011). The coreference resolution task focuses on the resolution of nominal anaphoric pronouns, de facto grouping our event and pleonastic categories together and discarding both of them.…”
Section: Related Workmentioning
confidence: 99%
“…13. Non-referential probability assigned to the instance of 'it' by NADA (Bergsma and Yarowsky, 2011).…”
Section: Featuresmentioning
confidence: 99%
“…Third, both Bergsma andYarowsky (2011) andde Marneffe et al (2015) find that the context words around the mention are an important feature for mention filtering. Context tokens can easily be included in the current set-up, and by using word embeddings generalization on these context words should be improved.…”
Section: The Current Approachmentioning
confidence: 99%