Revisit Automatic Error Detection for Wrong and Missing Translation – A Supervised Approach

Lei, Wenqiang; Xu, Weiwen; Aw, Ai Ti; Xiang, Yuanxin; Chua, Tat-Seng

doi:10.18653/v1/d19-1087

Cited by 9 publications

(4 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Robust Neural Machine Translation: Methods have been proposed to make NMT models resilient not only to adequacy errors (Lei et al, 2019) but also to both natural and synthetic noise. Incorporating monolingual data into NMT has the capacity to improve the robustness (Sennrich et al, 2016a;Edunov et al, 2018;Cheng et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

Addressing the Vulnerability of NMT in Input Perturbations

Xu¹,

Aw²,

Ding³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

Neural Machine Translation (NMT) has achieved significant breakthrough in performance but is known to suffer vulnerability to input perturbations. As real input noise is difficult to predict during training, robustness is a big issue for system deployment. In this paper, we improve the robustness of NMT models by reducing the effect of noisy words through a Context-Enhanced Reconstruction (CER) approach. CER trains the model to resist noise in two steps: (1) perturbation step that breaks the naturalness of input sequence with madeup words; (2) reconstruction step that defends the noise propagation by generating better and more robust contextual representation. Experimental results on Chinese-English (ZH-EN) and French-English (FR-EN) translation tasks demonstrate robustness improvement on both news and social media text. Further finetuning experiments on social media text show our approach can converge at a higher position and provide a better adaptation.

show abstract

Section: Related Workmentioning

confidence: 99%

Addressing the Vulnerability of NMT in Input Perturbations

Xu¹,

Aw²,

Ding³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

show abstract

“…These principles also guide the computational consideration of our detection method. For NMT, recently, Lei et al (2019) are the first to focus on accurately detecting wrong and missing translation of certain source words. Different from their work which detects the unsatisfactorily translated source words themselves, our work focuses on detecting the cause of them, and serves as complementary to recent interpretability analysis of importance words .…”

Section: Related Literaturementioning

confidence: 99%

Detecting and Understanding Generalization Barriers for Neural Machine Translation

Li,

Liu,

Zhu

et al. 2020

Preprint

View full text Add to dashboard Cite

Generalization to unseen instances is our eternal pursuit for all data-driven models. However, for realistic task like machine translation, the traditional approach measuring generalization in an average sense provides poor understanding for the fine-grained generalization ability. As a remedy, this paper attempts to identify and understand generalization barrier words within an unseen input sentence that cause the degradation of fine-grained generalization. We propose a principled definition of generalization barrier words and a modified version which is tractable in computation.Based on the modified one, we propose three simple methods for barrier detection by the search-aware risk estimation through counterfactual generation. We then conduct extensive analyses on those detected generalization barrier words on both Zh⇔En NIST benchmarks from various perspectives. Potential usage of the detected barrier words is also discussed.

show abstract

“…Inspired by [22,23], we deploy the negative sampling strategy [26] for effective training. Let P 0 denote the negative sample set, and P 1 denote the positive sample set, which is the same as the original set.…”

Section: Feature Extraction Modulementioning

confidence: 99%

A general framework for learning prosodic-enhanced representation of rap lyrics

Liang

Wang

et al. 2019

World Wide Web

View full text Add to dashboard Cite

Learning and analyzing rap lyrics is a significant basis for many web applications, such as music recommendation, automatic music categorization, and music information retrieval, due to the abundant source of digital music in the World Wide Web. Although numerous studies have explored the topic, knowledge in this field is far from satisfactory, because critical issues, such as prosodic information and its effective representation, as well as appropriate integration of various features, are usually ignored. In this paper, we propose a hierarchical attention variational autoencoder framework (HAVAE), which simultaneously consider semantic and prosodic features for rap lyrics representation learning. Specifically, the representation of the prosodic features is encoded by phonetic transcriptions with a novel and effective strategy (i.e., rhyme2vec). Moreover, a feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation. A comprehensive empirical evaluation demonstrates that the proposed framework outperforms the state-of-the-art approaches under various metrics in different rap lyrics learning tasks.

show abstract

Revisit Automatic Error Detection for Wrong and Missing Translation – A Supervised Approach

Cited by 9 publications

References 35 publications

Addressing the Vulnerability of NMT in Input Perturbations

Addressing the Vulnerability of NMT in Input Perturbations

Detecting and Understanding Generalization Barriers for Neural Machine Translation

A general framework for learning prosodic-enhanced representation of rap lyrics

Contact Info

Product

Resources

About