Findings of the Association for Computational Linguistics: EMNLP 2022 2022
DOI: 10.18653/v1/2022.findings-emnlp.15
|View full text |Cite
|
Sign up to set email alerts
|

NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures

Abstract: Being able to rank the similarity of short text segments is an interesting bonus feature of neural machine translation. Translation-based similarity measures include direct and pivot translation probability, as well as translation cross-likelihood, which has not been studied so far. We analyze these measures in the common framework of multilingual NMT, releasing the NMTSCORE library. Compared to baselines such as sentence embeddings, translation-based measures prove competitive in paraphrase identification and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…They also perform extensive bitext filtering, using several different language ID tools and the filtering method proposed in which uses perplexities of a GPT-2 model (Radford et al, 2019), LAESR embeddings (Chaudhary et al, 2019, NMTScore (Vamvas and Sennrich, 2022) using Prism (Thompson and Post, 2020a,b), and WAScore (Steingrímsson et al, 2021), as well as Bicleaner AI (Zaragoza-Bernabeu et al, 2022). -Hoang et al (2023) focus on using the phrase based dictionary to distill the high-quality sentences and making a pipeline to re-ranking the top-K cosine similarity.…”
Section: Steingrímssonmentioning
confidence: 99%
“…They also perform extensive bitext filtering, using several different language ID tools and the filtering method proposed in which uses perplexities of a GPT-2 model (Radford et al, 2019), LAESR embeddings (Chaudhary et al, 2019, NMTScore (Vamvas and Sennrich, 2022) using Prism (Thompson and Post, 2020a,b), and WAScore (Steingrímsson et al, 2021), as well as Bicleaner AI (Zaragoza-Bernabeu et al, 2022). -Hoang et al (2023) focus on using the phrase based dictionary to distill the high-quality sentences and making a pipeline to re-ranking the top-K cosine similarity.…”
Section: Steingrímssonmentioning
confidence: 99%
“…For each labeled span in the source sentence, we rank all the projection candidates that share the same category as the source span using their translation probabilities (also known as translation equivalence) which have been obtained by apply-ing the pretrained M2M100 (Fan et al, 2021) or NLLB200 (Costa-jussà et al, 2022) MT models and the NMTScore library 2 (Vamvas and Sennrich, 2022). Thus, given the source span A and the candidate B the translation probability is computed as follows (Vamvas and Sennrich, 2022):…”
Section: Candidate Selectionmentioning
confidence: 99%
“…Second, we rank the candidates based on the probability of being generated as a translation of the source spans. We use the M2M100 (Fan et al, 2021) and NLLB200 (Costajussà et al, 2022) state-of-the-art MT models to compute the translation probabilities (Vamvas and Sennrich, 2022).…”
Section: Introductionmentioning
confidence: 99%