Multi-Task Deep Neural Networks for Natural Language Understanding

Liu, Xiaodong; He, Pengcheng; Chen, Weizhu; Gao, Jianfeng

doi:10.18653/v1/p19-1441

Cited by 928 publications

(762 citation statements)

References 53 publications

Supporting

Mentioning

751

Contrasting

Unclassified

Order By: Relevance

“…These sentences are packed together into one input sequence which includes a premise and the hypothesis(2 separate components). The original MT-DNN [5] utilizes a Stochastic Answer Network [18] as opposed to just predict a label which allows it to maintain a state and iteratively refines its predictions for k-number of steps (where k is a hyperparameter) and averages the prediction at each step k to create a final prediction which improves the robustness of the model. P k r = softmax(W task * s k * x k ) (3) The equation above is very similar to the single sentence text prediction however it maintains a state s throughout each step k after which the probability distribution P r is averaged to produce the final output.…”

Section: Pairwise Text Classificationmentioning

confidence: 99%

“…Similar to the MT-DNN [5] , the ScRNN [20] in first trained off the Penn Treebank dataset with synthetic noise added.…”

Section: Scrnnmentioning

confidence: 99%

“…The MT-DNN model [5] takes data which is pertinent to the use case for which it is being implemented in, and then finetunes the pretrained model to that specific scenario. These opportunities for finetuning as opposed to simply using the pretrained model allow for a greater flexibility and accuracy for the specific use case.…”

Section: Mt-dnnmentioning

confidence: 99%

See 2 more Smart Citations

Exploring and Improving Robustness of Multi Task Deep Neural Networks via Domain Agnostic Defenses

Murali¹

2019

Preprint

View full text Add to dashboard Cite

In this paper, we explore the robustness of the Multi-Task Deep Neural Networks (MT-DNN) against non-targeted adversarial attacks across Natural Language Understanding (NLU) tasks as well as some possible ways to defend against them. Liu et al., have shown that the Multi-Task Deep Neural Network [5], due to the regularization effect produced when training as a result of it's cross task data, is more robust than a vanilla BERT model trained only on one task (1.1%-1.5% absolute difference). We further show that although the MT-DNN has generalized better, making it easily transferable across domains and tasks, it can still be compromised as after only 2 attacks (1-character and 2-character) the accuracy drops by 42.05% and 32.24% for the SNLI and SciTail tasks. Finally, we propose a domain agnostic defense which restores the model's accuracy (36.75% and 25.94% respectively) as opposed to a general-purpose defense or an off-the-shelf spell checker.

show abstract

Section: Pairwise Text Classificationmentioning

confidence: 99%

“…Similar to the MT-DNN [5] , the ScRNN [20] in first trained off the Penn Treebank dataset with synthetic noise added.…”

Section: Scrnnmentioning

confidence: 99%

See 1 more Smart Citation

Exploring and Improving Robustness of Multi Task Deep Neural Networks via Domain Agnostic Defenses

Murali¹

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently proposed transfer learning methods [8,17,18,5,15,10] show that significant improvement on downstream natural language processing (NLP) tasks can be obtained by finetuning a neural network that has been trained for language modeling (LM) over a large corpus of text data without task-specific annotations. Models leveraging these techniques have also shown faster convergence and encouraging results in a few-shot or arXiv:2001.11985v1 [cs.CL] 31 Jan 2020 limited-data settings [8].…”

Section: Introductionmentioning

confidence: 99%

“…Models leveraging these techniques have also shown faster convergence and encouraging results in a few-shot or arXiv:2001.11985v1 [cs.CL] 31 Jan 2020 limited-data settings [8]. Owing to their benefit, the use of this family of techniques is an emerging research topic in the NLP community [10]. However, it has received little attention in KGQA research so far.…”

Section: Introductionmentioning

confidence: 99%

Pretrained Transformers for Simple Question Answering over Knowledge Graphs

Lukovnikov

Fischer

Lehmann

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Answering simple questions over knowledge graphs is a well-studied problem in question answering. Previous approaches for this task built on recurrent and convolutional neural network based architectures that use pretrained word embeddings. It was recently shown that finetuning pretrained transformer networks (e.g. BERT) can outperform previous approaches on various natural language processing tasks. In this work, we investigate how well BERT performs on SIMPLEQUESTIONS and provide an evaluation of both BERT and BiLSTMbased models in limited-data scenarios.

show abstract

A classification method based on encoder‐decoder structure with paper content

Yin

Ouyang

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

The paper classification method aims to correctly divide the paper data according to the similarity of its content. However, how to accurately classify according to the content expressed in the paper has always been a problem that various classification algorithms need to face. At present, there is a kind of paper classification method based on deep learning and implemented by the encoder-decoder structure. This method inputs the words from a large number of papers into encoder, after calculating by NN (neural network) algorithm, the similarity degree of different papers is compared to achieve the purpose of classification. However, this type of method only considers the similarity between words, a NN algorithm can only calculate a large number of word information once, and it cannot find the regularity of classification through word information.But it has a difference with the similarity of the content. This paper starts from the perspective of considering the content, its label information is extracted, and the input vector of encoder-decoder structure is formed with labels and words. This improves the original paper classification method based on encoder-decoder structure. Firstly, the label information is based on the content, which can reflect the content of the paper. Secondly, the classification method which combines label information and word information can reflect the content of the paper comprehensively. Thirdly, the label information is independent of word information and NN algorithm is used separately to make this part of the content more consistent in the encoder-decoder structure.Finally, the label information and the word information are combined, respectively, with the output values obtained by different NN algorithms to realize the classification of the content. This paper proves the effectiveness of the proposed method by evaluating the paper data in web of science and obtaining relevant experimental results.

show abstract

Multi-Task Deep Neural Networks for Natural Language Understanding

Cited by 928 publications

References 53 publications

Exploring and Improving Robustness of Multi Task Deep Neural Networks via Domain Agnostic Defenses

Exploring and Improving Robustness of Multi Task Deep Neural Networks via Domain Agnostic Defenses

Pretrained Transformers for Simple Question Answering over Knowledge Graphs

A classification method based on encoder‐decoder structure with paper content

Contact Info

Product

Resources

About