Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems 2017
DOI: 10.18653/v1/w17-5401
|View full text |Cite
|
Sign up to set email alerts
|

Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task

Abstract: This paper presents a summary of the first Workshop on Building Linguistically Generalizable Natural Language Processing Systems, and the associated Build It Break It, The Language Edition shared task. The goal of this workshop was to bring together researchers in NLP and linguistics with a shared task aimed at testing the generalizability of NLP systems beyond the distributions of their training data. We describe the motivation, setup, and participation of the shared task, provide discussion of some highlight… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
65
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 74 publications
(66 citation statements)
references
References 17 publications
0
65
0
Order By: Relevance
“…As mentioned in the introduction, our approach takes inspiration from "Build it Break it" approaches which have been successfully tried in other domains (Ruef et al, 2016;Ettinger et al, 2017). Those approaches advocate finding faults in systems by having humans look for insecurities (in software) or prediction failures (in models), but do not advocate an automated approach as we do here.…”
Section: Related Workmentioning
confidence: 99%
“…As mentioned in the introduction, our approach takes inspiration from "Build it Break it" approaches which have been successfully tried in other domains (Ruef et al, 2016;Ettinger et al, 2017). Those approaches advocate finding faults in systems by having humans look for insecurities (in software) or prediction failures (in models), but do not advocate an automated approach as we do here.…”
Section: Related Workmentioning
confidence: 99%
“…In each pair, both annotators annotate the same document, sentence, or token depending on the task. 6 The annotation start by labelling articles in a sample of news articles as containing a protest or not. Sentences of these positively labelled documents are then labelled as containing protest information or not.…”
Section: Datamentioning
confidence: 99%
“…The methods that rely on manual and semi-automated coding, though reliable, require a tremendous amount of effort to replicate on new data as they depend intensely on high quality human effort. On the other hand, text classification and information extraction systems that rely on automated methods yield less reliable results as they tend to perform poorly on texts different from the ones they were developed and validated on [6,12]. The huge amount of news articles that are required to be analyzed and the constant need of repeating the same analyses on new data force us to push limits of automated protest information collection yet again.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Seq2Seq based neural architectures have become the go-to architecture to apply to sequence to sequence language tasks. Despite their excellent performance on these tasks, recent work has noted that these models usually do not fully capture the linguistic structure required to generalize beyond the dense sections of the data distribution (Ettinger et al, 2017), and as such, are likely to fail on samples from the tail end of the distribution (such as inputs that are noisy (Belinkov and Bisk, 2018) or of different lengths (Bentivogli et al, 2016)). In this paper, we look at a model's ability to generalize on a simple symbol rewriting task with a clearly defined structure.…”
Section: Introductionmentioning
confidence: 99%