Untitled

Devlin, Jacob; Chang, Ming‐Wei; Lee, Kenton; Toutanova, Kristina

doi:10.18653/v1/n19-1423

Cited by 17,879 publications

(12,493 citation statements)

References 45 publications

Supporting

Mentioning

12,398

Contrasting

Unclassified

Order By: Relevance

“…Models based on the Transformer architecture (Vaswani et al, 2017) have led to tremendous performance increases in a wide range of downstream tasks (Devlin et al, 2019). Despite these successes, the impact of the suggested parametrization choices, in particular the self-attention mechanism with its large number of attention heads distributed over several layers, has been the subject of many studies following two main lines of research.…”

Section: Introductionmentioning

confidence: 99%

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato¹,

Tiedemann²

2018

Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

253

204

View full text Add to dashboard Cite

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that attention heads learn simple positional patterns which are often redundant. In this paper, we propose to replace all but one attention head of each encoder layer with fixed -non-learnable -attentive patterns that are solely based on position and do not require any external knowledge. Our experiments show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

show abstract

Section: Introductionmentioning

confidence: 99%

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato¹,

Tiedemann²

2018

Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

253

204

View full text Add to dashboard Cite

show abstract

“…In this paper, a deep learning model ‐ BERT ‐ was used. With BERT, pre‐training is already performed on linguistic representation before the training begins (Devlin, Chang, Lee, & Toutanova, ). Our experiments were conducted based on the BERT pre‐training model (multi‐language version).…”

Section: Data Set and Methodsmentioning

confidence: 99%

Recognizing sentences concerning future research from the full text of JASIST

Zhu

Wang

2019

Proceedings of the Association for Information Science and Tech

View full text Add to dashboard Cite

In the conclusion part of an academic paper, sentences concerning future research provide an outlook about future focus of research and reflect the latest development trend in a specific topic or field. These sentences are of high value for further investigation. However, very little has been done on sentences talking about future research in academic full texts. Here, papers in JASIST were studied with respect to these sentences. Deep learning model was trained on articles containing sentences annotated as relating to “future research” by BERT model. The model capable of automatic extraction of such sentences from academic full texts was obtained. Later this model was verified on the unannotated corpus, which demonstrated good performance. Finally, clustering was performed for the extracted sentences. It was found by cluster analysis that the sentences concerning future research were divided into four types. Our findings provide support for classification of sentences concerning future research.

show abstract

“…The most common pre-training task is language modeling or a closely related variant (McCann et al, 2017;Peters et al, 2018;Devlin et al, 2019;Ziser and Reichart, 2018). The outputs of the pretrained DNN are often referred to as contextualized word embeddings, as these DNNs typically generate a vector embedding for each input word, which takes its context into account.…”

Section: Previous Workmentioning

confidence: 99%

“…We present a novel self-training method, suitable for neural dependency parsing. Our algorithm ( § 4) follows recent work that has demonstrated the power of pre-training for improving DNN models in NLP (Peters et al, 2018;Devlin et al, 2019) and particularly for domain adaptation (Ziser and Reichart, 2018). However, while in previous work a representation model, also known as a contextualized embedding model, is trained on a language modeling related task, our algorithm utilizes a representation model that is trained on sequence prediction tasks derived from the parser's output.…”

Section: Introductionmentioning

confidence: 99%

Deep Contextualized Self-training for Low Resource Dependency Parsing

Rotman

Reichart

2019

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, that is costly and laborious to create. In this paper we propose a selftraining algorithm that alleviates this annotation bottleneck by training a parser on its own output. Our Deep Contextualized Selftraining (DCST) algorithm utilizes representation models trained on sequence labeling tasks that are derived from the parser's output when applied to unlabeled data, and integrates these models with the base parser through a gating mechanism. We conduct experiments across multiple languages, both in low resource in-domain and in crossdomain setups, and demonstrate that DCST substantially outperforms traditional selftraining as well as recent semi-supervised training methods. 1 2

show abstract

Untitled

Cited by 17,879 publications

References 45 publications

An Analysis of Encoder Representations in Transformer-Based Machine Translation

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Recognizing sentences concerning future research from the full text of JASIST

Deep Contextualized Self-training for Low Resource Dependency Parsing

Contact Info

Product

Resources

About