2004
DOI: 10.1007/978-3-540-30194-3_25
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of Intelligibility Judgments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 4 publications
0
3
0
Order By: Relevance
“…One previous investigation ( Östling and Tiedemann, 2017) used BLEU (Papineni et al, 2002) to determine that 70k sentences was sufficient to provide decent quality for a neural LRMTS. The assumption from LRMT developers is that including humans is expensive and time-consuming avoiding inclusion of more human-like measurements such as adequacy (Doherty, 2018), HTER (Snover et al, 2006), and fluency (Reeder, 2004). This article describes uses of those metrics along with the following others to help better overcome the evaluation challenge.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One previous investigation ( Östling and Tiedemann, 2017) used BLEU (Papineni et al, 2002) to determine that 70k sentences was sufficient to provide decent quality for a neural LRMTS. The assumption from LRMT developers is that including humans is expensive and time-consuming avoiding inclusion of more human-like measurements such as adequacy (Doherty, 2018), HTER (Snover et al, 2006), and fluency (Reeder, 2004). This article describes uses of those metrics along with the following others to help better overcome the evaluation challenge.…”
Section: Related Workmentioning
confidence: 99%
“…Range BLEU (Papineni et al, 2002) ≈ 15-35 ChrF (Popović, 2015) ≈ 40-70 BERTscore (Zhang et al, 2020) ≈ 60-80 COMET (Rei et al, 2020b) ≈ 15-60 BLEURT (Sellam et al, 2020) ≈ 25-50 METEOR (Denkowski and Lavie, 2011) ≈ 20-50 Fluency (Reeder, 2004) ≈ 1.0-3.0 The metrics and accompanying scores in Table 1 are meant to serve as a guide for what a company could expect from a LRMTS given the current systems that have been deployed in the wild. Most LRMTS are not good enough to use in the eyes of the low-resource community (Mager et al, 2023) but deployment can be considered for some cases like crises or others (O'Brien and Cadwell, 2017) as long as the proper care is taken to set appropriate expectations (especially for non-critical situations).…”
Section: Metricmentioning
confidence: 99%
“…et al (2000) classified accuracy into several categories including simple string accuracy, generation string accuracy, and two corresponding tree-based accuracy. Reeder (2004) found the correlation between fluency and the number of words it takes to distinguish between human translation and MT output.…”
Section: Fluency Adequacy and Comprehensionmentioning
confidence: 99%