Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1441
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Deep Neural Networks for Natural Language Understanding

Abstract: We present MT-DNN 1 , an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models. Built upon PyTorch and Transformers, MT-DNN is designed to facilitate rapid customization for a broad spectrum of NLU tasks, using a variety of objectives (classification, regression, structured prediction) and text encoders (e.g., RNNs, BERT, RoBERTa, UniLM). A unique feature of MT-DNN is its built-in support for robust and transferable l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
751
4
2

Year Published

2019
2019
2020
2020

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 928 publications
(762 citation statements)
references
References 53 publications
5
751
4
2
Order By: Relevance
“…These sentences are packed together into one input sequence which includes a premise and the hypothesis(2 separate components). The original MT-DNN [5] utilizes a Stochastic Answer Network [18] as opposed to just predict a label which allows it to maintain a state and iteratively refines its predictions for k-number of steps (where k is a hyperparameter) and averages the prediction at each step k to create a final prediction which improves the robustness of the model. P k r = softmax(W task * s k * x k ) (3) The equation above is very similar to the single sentence text prediction however it maintains a state s throughout each step k after which the probability distribution P r is averaged to produce the final output.…”
Section: Pairwise Text Classificationmentioning
confidence: 99%
See 2 more Smart Citations
“…These sentences are packed together into one input sequence which includes a premise and the hypothesis(2 separate components). The original MT-DNN [5] utilizes a Stochastic Answer Network [18] as opposed to just predict a label which allows it to maintain a state and iteratively refines its predictions for k-number of steps (where k is a hyperparameter) and averages the prediction at each step k to create a final prediction which improves the robustness of the model. P k r = softmax(W task * s k * x k ) (3) The equation above is very similar to the single sentence text prediction however it maintains a state s throughout each step k after which the probability distribution P r is averaged to produce the final output.…”
Section: Pairwise Text Classificationmentioning
confidence: 99%
“…Similar to the MT-DNN [5] , the ScRNN [20] in first trained off the Penn Treebank dataset with synthetic noise added.…”
Section: Scrnnmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently proposed transfer learning methods [8,17,18,5,15,10] show that significant improvement on downstream natural language processing (NLP) tasks can be obtained by finetuning a neural network that has been trained for language modeling (LM) over a large corpus of text data without task-specific annotations. Models leveraging these techniques have also shown faster convergence and encouraging results in a few-shot or arXiv:2001.11985v1 [cs.CL] 31 Jan 2020 limited-data settings [8].…”
Section: Introductionmentioning
confidence: 99%
“…Models leveraging these techniques have also shown faster convergence and encouraging results in a few-shot or arXiv:2001.11985v1 [cs.CL] 31 Jan 2020 limited-data settings [8]. Owing to their benefit, the use of this family of techniques is an emerging research topic in the NLP community [10]. However, it has received little attention in KGQA research so far.…”
Section: Introductionmentioning
confidence: 99%