Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1027
|View full text |Cite
|
Sign up to set email alerts
|

Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

Abstract: Can attention-or gradient-based visualization techniques be used to infer token-level labels for binary sequence tagging problems, using networks trained only on sentence-level labels? We construct a neural network architecture based on soft attention, train it as a binary sentence classifier and evaluate against tokenlevel annotation on four different datasets. Inferring token labels from a network provides a method for quantitatively evaluating what the model is learning, along with generating useful feedbac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
40
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4
2

Relationship

2
8

Authors

Journals

citations
Cited by 47 publications
(41 citation statements)
references
References 21 publications
1
40
0
Order By: Relevance
“…This ties together the label predictions on different levels, encouraging the objectives to work together and improve performance on both tasks. The architecture is based on the zero-shot sequence labeling framework by Rei and Søgaard (2018) which we extend with additional objectives and joint supervision on multiple levels. We will first describe the core architecture of the model and then provide details on different objective functions for optimization.…”
Section: Model Architecturementioning
confidence: 99%
“…This ties together the label predictions on different levels, encouraging the objectives to work together and improve performance on both tasks. The architecture is based on the zero-shot sequence labeling framework by Rei and Søgaard (2018) which we extend with additional objectives and joint supervision on multiple levels. We will first describe the core architecture of the model and then provide details on different objective functions for optimization.…”
Section: Model Architecturementioning
confidence: 99%
“…Behind our approach lies the simple observation that we can correlate the token-level attention devoted by a recurrent neural network, even if trained on sentence-level signals, with any measure defined at the token level. In other words, we can compare the attention devoted by a recurrent neural network to various measures, including token-level annotation (Rei and Søgaard, 2018) and eye-tracking measures. The latter is particularly interesting as it is typically considered a measurement of human attention.…”
Section: Methodsmentioning
confidence: 99%
“…For this reason, research has been focused lately on models that can work in a zero-shot setting, i.e., without being explicitly trained on data from the target language or domain. This training paradigm has been utilized with great effect for several popular NLP problems, such as cross-lingual document retrieval [25], sequence labeling [26], cross-lingual dependency parsing [27], and reading comprehension [28]. More specific to classification tasks, Ye et al [29] developed a reinforcement learning framework for cross-task text classification, which was tested also on the problem of sentiment classification in a monolingual setting.…”
Section: Related Workmentioning
confidence: 99%