Simple Recurrent Units for Highly Parallelizable Recurrence

Leí, Tao; Wang, Sida I.; Ding, Hui; Artzi, Yoav

doi:10.18653/v1/d18-1477

Cited by 194 publications

(141 citation statements)

References 34 publications

Supporting

Mentioning

133

Contrasting

Order By: Relevance

“…We apply the same encoder architecture throughout all experiments. We use a 4 layer recurrent neural network, with SRU cells (Lei et al, 2018) and a hidden size of 128. We use pretrained GloVe embeddings (Pennington et al, 2014), which are fixed during training.…”

Section: Hyperparameters and Implementation Detailsmentioning

confidence: 99%

Metric Learning for Dynamic Text Classification

Wohlwend¹,

Elenberg²,

Altschul³

et al. 2019

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Self Cite

View full text Add to dashboard Cite

Traditional text classifiers are limited to predicting over a fixed set of labels. However, in many real-world applications the label set is frequently changing. For example, in intent classification, new intents may be added over time while others are removed.We propose to address the problem of dynamic text classification by replacing the traditional, fixed-size output layer with a learned, semantically meaningful metric space. Here the distances between textual inputs are optimized to perform nearest-neighbor classification across overlapping label sets. Changing the label set does not involve removing parameters, but rather simply adding or removing support points in the metric space. Then the learned metric can be fine-tuned with only a few additional training examples.We demonstrate that this simple strategy is robust to changes in the label space. Furthermore, our results show that learning a non-Euclidean metric can improve performance in the low data regime, suggesting that further work on metric spaces may benefit lowresource research. 1

show abstract

Section: Hyperparameters and Implementation Detailsmentioning

confidence: 99%

Metric Learning for Dynamic Text Classification

Wohlwend¹,

Elenberg²,

Altschul³

et al. 2019

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Self Cite

View full text Add to dashboard Cite

show abstract

“…While many of the models cited above implement their RNNs with an LSTM (Hochreiter and Schmidhuber, 1997), we instead use an SRU (Lei et al, 2018). SRU uses light recurrence, which makes it highly parallelizable, and Lei et al (2018) showed that it trains 5-9x faster than cuDNN LSTM. SRU also exhibits a significant speedup in inference time compared to LSTM (by a factor of 4.1x in our experiments), which is particularly relevant in a production setting.…”

Section: Related Workmentioning

confidence: 99%

“…SRU also exhibits a significant speedup in inference time compared to LSTM (by a factor of 4.1x in our experiments), which is particularly relevant in a production setting. Furthermore, Lei et al (2018) showed that SRU matches or exceeds the performance of models using LSTMs or the Transformer architecture (Vaswani et al, 2017) on a number of NLP tasks, meaning significant speed gains can be achieved without a drop in performance.…”

Section: Related Workmentioning

confidence: 99%

“…We present a dual encoder architecture that is optimized to select among as many as 10,000 responses within a couple tens of milliseconds. The model makes use of a fast recurrent network implementation (Lei et al, 2018) and multiheaded attention and achieves over a 4.1x inference speedup over traditional encoders such as LSTM (Hochreiter and Schmidhuber, 1997). The independent dual encoding allows pre-computing the embeddings of candidate responses, thereby making the approach highly scalable with the size of the whitelist.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Building a Production Model for Retrieval-Based Chatbots

Swanson¹,

Yu²,

Fox³

et al. 2019

Proceedings of the First Workshop on NLP for Conversational AI

Self Cite

View full text Add to dashboard Cite

Response suggestion is an important task for building human-computer conversation systems. Recent approaches to conversation modeling have introduced new model architectures with impressive results, but relatively little attention has been paid to whether these models would be practical in a production setting. In this paper, we describe the unique challenges of building a production retrieval-based conversation system, which selects outputs from a whitelist of candidate responses. To address these challenges, we propose a dual encoder architecture which performs rapid inference and scales well with the size of the whitelist. We also introduce and compare two methods for generating whitelists, and we carry out a comprehensive analysis of the model and whitelists. Experimental results on a large, proprietary help desk chat dataset, including both offline metrics and a human evaluation, indicate production-quality performance and illustrate key lessons about conversation modeling in practice.

show abstract

“…In this experiment, we fine-tune the BERT model on two standard text classification benchmarks: TREC (Li and Roth, 2002) and Sentiment Treebank (Socher et al, 2013). We then apply knowledge distillation to reduce the BERT model to a simple 4 layer, 256 units, SRU network (Lei et al, 2018). This is a typical multistage experiment with preprossessing, fine tuning, and distillation stages.…”

Section: Case Study: Bert Distillationmentioning

confidence: 99%

Flambé: A Customizable Framework for Machine Learning Experiments

Wohlwend

Matthews²,

Itzcovich³

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

View full text Add to dashboard Cite

Flambé is a machine learning experimentation framework built to accelerate the entire research life cycle. Flambé's main objective is to provide a unified interface for prototyping models, running experiments containing complex pipelines, monitoring those experiments in real-time, reporting results, and deploying a final model for inference. Flambé achieves both flexibility and simplicity by allowing users to write custom code but instantly include that code as a component in a larger system which is represented by a concise configuration file format. We demonstrate the application of the framework through a cuttingedge multistage use case: fine-tuning and distillation of a state of the art pretrained language model used for text classification. 1

show abstract

Simple Recurrent Units for Highly Parallelizable Recurrence

Cited by 194 publications

References 34 publications

Metric Learning for Dynamic Text Classification

Metric Learning for Dynamic Text Classification

Building a Production Model for Retrieval-Based Chatbots

Flambé: A Customizable Framework for Machine Learning Experiments

Contact Info

Product

Resources

About