Massive Exploration of Neural Machine Translation Architectures

Britz, Denny; Goldie, Anna; Luong, Minh-Thang; Le, Quoc V.

doi:10.18653/v1/d17-1151

Cited by 375 publications

(268 citation statements)

References 28 publications

Supporting

Mentioning

245

Contrasting

Unclassified

Order By: Relevance

“…While in their large-scale NMT experiments (Britz et al, 2017) use word-based sequences, in our case we use character-based ones. This simply involves changing "delimiter" option in configuration files.…”

Section: Modelmentioning

confidence: 99%

“…Contrary to word-based sequences, we use character-based sequences for generating grammatically correct and natural utterances. (Britz et al, 2017), provides an overview of the framework. While many options are configurable (number of layers, unidirectional vs bidirectional encoder, additive vs multiplicative attention mechanism, GRU (Cho et al, 2014) vs LSTM cells (Hochreiter and Schmidhuber, 1997), etc.…”

Section: Modelmentioning

confidence: 99%

See 1 more Smart Citation

A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset

Agarwal

Dymetman

2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

View full text Add to dashboard Cite

We train a char2char model on the E2E NLG Challenge data, by exploiting "out-of-the-box" the recently released tfseq2seq framework, using some of the standard options of this tool. With minimal effort, and in particular without delexicalization, tokenization or lowercasing, the obtained raw predictions, according to a small scale human evaluation, are excellent on the linguistic side and quite reasonable on the adequacy side, the primary downside being the possible omissions of semantic material. However, in a significant number of cases (more than 70%), a perfect solution can be found in the top-20 predictions, indicating promising directions for solving the remaining issues.

show abstract

Section: Modelmentioning

confidence: 99%

Section: Modelmentioning

confidence: 99%

A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset

Agarwal

Dymetman

2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

View full text Add to dashboard Cite

show abstract

“…The implementation of the described Sequence to Sequence model has been done using tf-seq2seq framework (Britz et al, 2017) for Tensorflow (Abadi et al, 2016). The experiment was conducted after splitting the training set and the test set in the ratio 7 : 3.…”

Section: Sequence To Sequence Modelmentioning

confidence: 99%

When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data

Jha¹,

Mamidi²

2017

Proceedings of the Second Workshop on NLP and Computational Social Science

125

View full text Add to dashboard Cite

Sexism is prevalent in today's society, both offline and online, and poses a credible threat to social equality with respect to gender. According to ambivalent sexism theory (Glick and Fiske, 1996), it comes in two forms: Hostile and Benevolent. While hostile sexism is characterized by an explicitly negative attitude, benevolent sexism is more subtle. Previous works on computationally detecting sexism present online are restricted to identifying the hostile form. Our objective is to investigate the less pronounced form of sexism demonstrated online. We achieved this by creating and analyzing a dataset of tweets that exhibit benevolent sexism. We classified tweets into 'Hostile', 'Benevolent' or 'Others' class depending on the kind of sexism they exhibit, by using Support Vector Machines (SVM), sequenceto-sequence models and FastText classifier. We achieved the best F1-score using FastText classifier. Our work aims to analyze and understand the much prevalent ambivalent sexism in social media.

show abstract

“…Recently, Britz et al (2017) have released a paper about exploring the hyper-parameters of NMT. This work is similar to our paper in the terms of finding the better hyper-parameters by doing a large number of experiments and deriving empirical conclusions.…”

Section: Related Workmentioning

confidence: 99%

“…While this can lead to much faster convergence, the resulting models are shown to slightly underperform compared to annealing SGD . However, Adam's speed and reputation of generally being "good enough" have made it a popular choice for researchers and NMT toolkit authors 6 (Arthur et al, 2016;Lee et al, 2016;Britz et al, 2017;Sennrich et al, 2017).…”

Section: Introductionmentioning

confidence: 99%