Automated Testing for Machine Translation via Constituency Invariance

Ji, Pin; Feng, Yang; Liu, Jia; Zhao, Zhihong; Xu, Baowen

doi:10.1109/ase51524.2021.9678715

Cited by 7 publications

(2 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To avoid this challenge, existing work attempts to judge the translation quality of such test case without its reference translation, yet they can only diagnose translation errors related to specific types of edits that target certain capabilities. For example, ; Sun et al (2020); Gupta et al (2020); Ji et al (2021) only diagnose translation errors on test cases with the editing of a single noun or adjective word, and ; ; Raunak et al (2022) can only diagnose incorrect translation of noun phrases, quantities or currency units that is related to the edits on them.…”

Section: 𝒓mentioning

confidence: 99%

“…In other words, the recall of translation errors identified by these approaches is low. On the other hand, Sun et al, 2020;Gupta et al, 2020;Ji et al, 2021) first edit a single noun or adjective in x to create a series of similar sentences as test cases, then denote M as passing the behavioral testing if its translations of these sentences have similar syntactic structures, based on an assumption that the translations of similar sentences should be analogous. The reason why they only modify a single noun or adjective is to avoid largely shifting the structure of x, yet still limits the types of capability they can test as well as translation errors they can find.…”

Section: Related Work: Challenges and Existing Solutionsmentioning

confidence: 99%

See 1 more Smart Citation

Towards General Error Diagnosis via Behavioral Testing in Machine Translation

Wu,

Liu,

Yeung

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Behavioral testing offers a crucial means of diagnosing linguistic errors and assessing capabilities of NLP models. However, applying behavioral testing to machine translation (MT) systems is challenging as it generally requires human efforts to craft references for evaluating the translation quality of such systems on newly generated test cases. Existing works in behavioral testing of MT systems circumvent this by evaluating translation quality without references, but this restricts diagnosis to specific types of errors, such as incorrect translation of single numeric or currency words. In order to diagnose general errors, this paper proposes a new Bilingual Translation Pair Generation based Behavior Testing (BTPGBT) framework for conducting behavioral testing of MT systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation (BTPG) approach that automates the construction of high-quality test cases and their pseudoreferences. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results for general error diagnosis, which further leads to several insightful findings. Our code and data are available at https: //github.com/wujunjie1998/BTPGBT.

show abstract