Automatic assessment of spoken-language interpreting based on machine-translation evaluation metrics

Lu, Xiaolei; Han, Chao

doi:10.1075/intp.00076.lu

Cited by 5 publications

(8 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For RQ1 and RQ2, we find that ChatGPT excels under human evaluation and semantic-aware automatic evaluation (see Figure 1 and Table 2). For RQ3, we find that the two methods may exhibit divergences when approaching machine-translated texts, contrary to Lu and Han (2023)'s findings that automated metrics can show moderate to strong correlations with human-assigned scores in assessing interpreting outputs, possibly due to the inherent differences between interpreting and translation. ing (Brown et al, 2020;Chowdhery et al, 2022;Wei et al, 2022a,b;Wang et al, 2022), and several studies have explored the influence of prompting strategies on the translation performance of LLMs (Jiao et al, 2023;Hendy et al, 2023;Peng et al, 2023;Chen et al, 2023;He et al, 2023).…”

Section: Introductioncontrasting

confidence: 90%

“…Analytic rubric scoring is another method widely adopted in TQA research. It is founded on the assumption that the overall concept of quality can be broken down into individual components, and typically comprises several sub-scales addressing separate dimensions of translation (Lu and Han, 2023). To complement the error typology-based evaluation, we propose six analytic rubrics to capture translation quality from different perspectives, encompassing dimensions of (1) coherence, (2) adherence to norms, (3) style, tone, and register appropriateness, (4) cultural sensitivity, (5) clarity, and (6) practicality.…”

Section: Human Evaluation Based On Analytic Rubric Scoringmentioning

confidence: 99%

See 1 more Smart Citation

Optimization Algorithms for Tibetan-Chinese Neural Machine Translation

Jiang

2020

IOP Conf. Ser.: Mater. Sci. Eng.

View full text Add to dashboard Cite

Tibetan-Chinese neural machine translation (NMT) is facing serious resource scarcity problem. This paper compares the application effect of multiple neural network optimization algorithms under the condition of resource scarcity, and proposes an optimization method which is suitable for resource scarcity languages. Then, by which it improving the performance of the Tibetan-Chinese NMT. Experimental results show that when choosing a suitable optimization algorithm, the Tibetan-Chinese NMT can still exceed the traditional statistical machine translation (SMT) and achieve better translation performance.

show abstract

Section: Introductioncontrasting

confidence: 90%

Section: Human Evaluation Based On Analytic Rubric Scoringmentioning

confidence: 99%

Optimization Algorithms for Tibetan-Chinese Neural Machine Translation

Jiang

2020

IOP Conf. Ser.: Mater. Sci. Eng.

View full text Add to dashboard Cite

show abstract

“…In recent empirical studies (Chung, 2020;Han and Lu, 2021;Lu and Han, 2022), a few researchers have investigated the utility of several metrics (i.e., BLEU, METEOR, NIST, and TER) in assessing translations or interpretations and correlate the metric scores with the human assigned scores. Chung (2020) computes two metrics (i.e., BLEU and METEOR) to assess 120 German-to-Korean translations produced by ten student translators on 12 German texts concerning a variety of topics.…”

Section: Computational Features For Fidelity Assessmentmentioning

confidence: 99%

“…Recently, Lu and Han (2022) in another study evaluate 56 bidirectional consecutive English-Chinese interpretations produced by 28 student interpreters of varying abilities by the same metrics and one more pre-trained model, Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019). They correlate the automated metric scores with the scores assigned by different types of raters using different scoring methods (i.e., multiple assessment scenarios).…”

Section: Computational Features For Fidelity Assessmentmentioning

confidence: 99%

“…In Lu and Han's (2022) study, neural networks of computing systems are inspired by biological neural networks to perform different tasks with a huge amount of data involved. Different algorithms are used to understand the relationships in a given data set to produce the best results from the changing inputs.…”

Section: Computational Features For Fidelity Assessmentmentioning

confidence: 99%

See 1 more Smart Citation

Machine-learning based automatic assessment of communication in interpreting

Liu

2023

Front. Commun.

View full text Add to dashboard Cite

Communication assessment in interpreting has developed into an area with new models and continues to receive growing attention in recent years. The process refers to the assessment of messages composed of both “verbal” and “nonverbal” signals. A few relevant studies revolving around automatic scoring investigated the assessment of fluency based on objective temporal measures, and the correlation between the machine translation metrics and human scores. There is no research exploring machine-learning-based automatic scoring in-depth integrating parameters of delivery and information. What remains fundamentally challenging to demonstrate is which parameters, extracted through an automatic methodology, predict more reliable results. This study presents an original study with the aim to propose and test a machine learning approach to automatically assess communication in English/Chinese interpreting. It proposes to build predictive models using machine learning algorithms, extracting parameters for delivery, and applying a translation quality estimation model for information assessment to describe the final model. It employs the K-nearest neighbour algorithm and support vector machine for further analysis. It is found that the best machine-learning model built with all features by Support Vector Machine shows an accuracy of 62.96%, which is better than the K-nearest neighbour model with an accuracy of 55.56%. The assessment results of the pass level can be accurately predicted, which indicates that the machine learning models are able to screen the interpretations that pass the exam. The study is the first to build supervised machine learning models integrating both delivery and fidelity features to predict quality of interpreting. The machine learning models point to the great potential of automatic scoring with little human evaluation involved in the process. Automatic assessment of communication is expected to complete multi-tasks within a brief period by taking both holistic and analytical approaches to assess accuracy, fidelity and delivery. The proposed automatic scoring system might facilitate human-machine collaboration in the future. It can generate instant feedback for students by evaluating input renditions or abridge the workload for educators in interpreting education by screening performance for subsequent human scoring.

show abstract

A Comparative Study on Transformer Versus Sequence to Sequence in Machine Translation

Jiang

Zhao

et al. 2021

Modern Industrial IoT, Big Data and Supply Chain

View full text Add to dashboard Cite

Inspired by the increasing interest in leveraging large language models for translation, this paper evaluates the capabilities of large language models (LLMs) represented by ChatGPT in comparison to the mainstream neural machine translation (NMT) engines in translating Chinese diplomatic texts into English. Specifically, we examine the translation quality of ChatGPT and NMT engines as measured by four automated metrics and human evaluation based on an error-typology and six analytic rubrics. Our findings show that automated metrics yield similar results for ChatGPT under different prompts and NMT systems, while human annotators tend to assign noticeably higher scores to ChatGPT when it is provided an example or contextual information about the translation task. Pairwise correlation between automated metrics and dimensions of human evaluation produces weak and non-significant results, suggesting the divergence between the two methods of translation quality assessment. These findings provide valuable insights into the potential of ChatGPT as a capable machine translator, and the influence of prompt engineering on its performance.

show abstract

Automatic assessment of spoken-language interpreting based on machine-translation evaluation metrics

Cited by 5 publications

References 26 publications

Optimization Algorithms for Tibetan-Chinese Neural Machine Translation

Optimization Algorithms for Tibetan-Chinese Neural Machine Translation

Machine-learning based automatic assessment of communication in interpreting

A Comparative Study on Transformer Versus Sequence to Sequence in Machine Translation

Contact Info

Product

Resources

About