2021
DOI: 10.48550/arxiv.2109.14250
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-oriented Text

Hadeel Saadany,
Constantin Orasan

Abstract: Social media companies as well as authorities make extensive use of artificial intelligence (AI) tools to monitor postings of hate speech, celebrations of violence or profanity. Since AI software requires massive volumes of data to train computers, Machine Translation (MT) of the online content is commonly used to process posts written in several languages and hence augment the data needed for training. However, MT mistakes are a regular occurrence when translating sentiment-oriented user-generated content (UG… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 12 publications
0
0
0
Order By: Relevance
“…In theory, this metric would take into account the context of each word, which would capture the semantics of each word. However, it has been found that BERTScore may still be insufficient in cases where individual tokens like negations significantly change the meaning of the sentence, even if it is marginally better than lexical methods like BLEU, ROUGE, and METEOR (Saadany and Orasan, 2021).…”
Section: Sample Sentencementioning
confidence: 99%
See 1 more Smart Citation
“…In theory, this metric would take into account the context of each word, which would capture the semantics of each word. However, it has been found that BERTScore may still be insufficient in cases where individual tokens like negations significantly change the meaning of the sentence, even if it is marginally better than lexical methods like BLEU, ROUGE, and METEOR (Saadany and Orasan, 2021).…”
Section: Sample Sentencementioning
confidence: 99%
“…Although previous semantics-based evaluation metrics like BERTScore exist, we do not find them to be appropriate for our use case. Previous semantics-based evaluation metrics do not work well for cases where a single token can dramatically change the semantics of a statement, such as negations like "not" (Saadany and Orasan, 2021). Thus, we introduce a set of 3 evaluation metrics based on entailment for more accurate semantic evaluation.…”
Section: Semantic-based Evaluation Metricsmentioning
confidence: 99%