Proceedings of the 3rd Workshop on Neural Generation and Translation 2019
DOI: 10.18653/v1/d19-5617
|View full text |Cite
|
Sign up to set email alerts
|

Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

Abstract: We share a French-English parallel corpus of Foursquare restaurant reviews, and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of such usergenerated content, and train good baseline models that build upon the latest techniques for MT robustness. We also perform an extensive evaluation (automatic and human) that shows significant improvements over exist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 19 publications
(38 reference statements)
0
17
0
Order By: Relevance
“…The advancement of Neural Machine Translation (NMT) has brought great improvement in translation quality when translating clean input, such as text from the news domain (Luong et al, 2015;Vaswani et al, 2017), and it was recently claimed that NMT has even achieved human parity in certain language pairs (Hassan et al, 2018;Barrault et al, 2019). Despite its remarkable advancements, the applicability of NMT over User-Generated Contents (UGC), such as social media text, still remains limited (Michel and Neubig, 2018;Berard et al, 2019a). Since UGC are prevailing in our real-life communication, it is undoubtedly one of the challenges we need to overcome to make MT systems invaluable for promoting cross-cultural communication.…”
Section: Introductionmentioning
confidence: 99%
“…The advancement of Neural Machine Translation (NMT) has brought great improvement in translation quality when translating clean input, such as text from the news domain (Luong et al, 2015;Vaswani et al, 2017), and it was recently claimed that NMT has even achieved human parity in certain language pairs (Hassan et al, 2018;Barrault et al, 2019). Despite its remarkable advancements, the applicability of NMT over User-Generated Contents (UGC), such as social media text, still remains limited (Michel and Neubig, 2018;Berard et al, 2019a). Since UGC are prevailing in our real-life communication, it is undoubtedly one of the challenges we need to overcome to make MT systems invaluable for promoting cross-cultural communication.…”
Section: Introductionmentioning
confidence: 99%
“…Previous work has used translation quality measures such as BLEU on noisy input as an indicator of robustness. Absolute model performance on noisy input is important, and we believe this is an appropriate measure for noisy domain evaluation (Michel and Neubig, 2018;Berard et al, 2019;. However, it does not disentangle model quality from the relative degradation under added noise.…”
Section: Referencementioning
confidence: 99%
“…EN↔DE and EN↔FI models are trained with pre-processed WMT18 news data and tested with the latest news test sets (new-stest2019). Recently, two datasets were built from usergenerated content, MTNT (Michel and Neubig, 2018) and 4SQ (Berard et al, 2019 baseline models are trained with aggregated data of Europarl-v7 (Koehn, 2005), NewsCommentary-v14 (Bojar et al, 2018), OpenSubtitles-v2018 (Lison and Tiedemann, 2016), and ParaCrawl-v5 1 , which simulates the UGC training corpus used in 4SQ benchmarks, and they are tested with the latest WMT new test sets supporting EN↔FR (new-stest2014).…”
Section: Tasks and Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Berard et al (2019a) showed that a large monolingual corpus of UGT can be successfully back-translated with a system trained on P R1-R2 parallel data.…”
mentioning
confidence: 99%