Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

Rei, Marek; Søgaard, Anders

doi:10.18653/v1/n18-1027

Cited by 47 publications

(41 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This ties together the label predictions on different levels, encouraging the objectives to work together and improve performance on both tasks. The architecture is based on the zero-shot sequence labeling framework by Rei and Søgaard (2018) which we extend with additional objectives and joint supervision on multiple levels. We will first describe the core architecture of the model and then provide details on different objective functions for optimization.…”

Section: Model Architecturementioning

confidence: 99%

Jointly Learning to Label Sentences and Tokens

Rei

Søgaard

2019

AAAI

Self Cite

View full text Add to dashboard Cite

Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language composition can allow us to guide the models based on existing knowledge, regularizing them towards more robust and interpretable representations. In this paper, we investigate how objectives at different granularities can be used to learn better language representations and we propose an architecture for jointly learning to label sentences and tokens. The predictions at each level are combined together using an attention mechanism, with token-level labels also acting as explicit supervision for composing sentence-level representations. Our experiments show that by learning to perform these tasks jointly on multiple levels, the model achieves substantial improvements for both sentence classification and sequence labeling.

show abstract

Section: Model Architecturementioning

confidence: 99%

Jointly Learning to Label Sentences and Tokens

Rei

Søgaard

2019

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Behind our approach lies the simple observation that we can correlate the token-level attention devoted by a recurrent neural network, even if trained on sentence-level signals, with any measure defined at the token level. In other words, we can compare the attention devoted by a recurrent neural network to various measures, including token-level annotation (Rei and Søgaard, 2018) and eye-tracking measures. The latter is particularly interesting as it is typically considered a measurement of human attention.…”

Section: Methodsmentioning

confidence: 99%

Sequence Classification with Human Attention

Barrett¹,

Bingel²,

Hollenstein³

et al. 2018

Proceedings of the 22nd Conference on Computational Natural Language Learning

Self Cite

View full text Add to dashboard Cite

Learning attention functions requires large volumes of data, but many NLP tasks simulate human behavior, and in this paper, we show that human attention really does provide a good inductive bias on many attention functions in NLP. Specifically, we use estimated human attention derived from eyetracking corpora to regularize attention functions in recurrent neural networks. We show substantial improvements across a range of tasks, including sentiment analysis, grammatical error detection, and detection of abusive language.

show abstract

“…For this reason, research has been focused lately on models that can work in a zero-shot setting, i.e., without being explicitly trained on data from the target language or domain. This training paradigm has been utilized with great effect for several popular NLP problems, such as cross-lingual document retrieval [25], sequence labeling [26], cross-lingual dependency parsing [27], and reading comprehension [28]. More specific to classification tasks, Ye et al [29] developed a reinforcement learning framework for cross-task text classification, which was tested also on the problem of sentiment classification in a monolingual setting.…”

Section: Related Workmentioning

confidence: 99%

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

et al. 2020

View full text Add to dashboard Cite

In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training.

show abstract

Zero-Shot Sequence Labeling: Transferring Knowledge from Sentences to Tokens

Cited by 47 publications

References 21 publications

Jointly Learning to Label Sentences and Tokens

Jointly Learning to Label Sentences and Tokens

Sequence Classification with Human Attention

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Contact Info

Product

Resources

About