2021
DOI: 10.3390/informatics8010007
|View full text |Cite
|
Sign up to set email alerts
|

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Abstract: We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(28 citation statements)
references
References 53 publications
2
26
0
Order By: Relevance
“…In this study, we focus on a simple approach to TM-NMT integration, neural fuzzy repair (NFR), that relies on source sentence augmentation through the concatenation of translations of similar source sentences retrieved from a TM [3]. This method has been shown to work well with the Transformer architecture [29], with the FM retrieval being based on the cosine similarity of sentence embeddings [4,5]. In this paper, we do not focus on comparing different TM-MT integration methods, but rather on evaluating one NFR configuration that was shown to perform well in a previous study, using BLEU as evaluation metric [4].…”
Section: Tm-mt Integrationmentioning
confidence: 99%
See 2 more Smart Citations
“…In this study, we focus on a simple approach to TM-NMT integration, neural fuzzy repair (NFR), that relies on source sentence augmentation through the concatenation of translations of similar source sentences retrieved from a TM [3]. This method has been shown to work well with the Transformer architecture [29], with the FM retrieval being based on the cosine similarity of sentence embeddings [4,5]. In this paper, we do not focus on comparing different TM-MT integration methods, but rather on evaluating one NFR configuration that was shown to perform well in a previous study, using BLEU as evaluation metric [4].…”
Section: Tm-mt Integrationmentioning
confidence: 99%
“…This method has been shown to work well with the Transformer architecture [29], with the FM retrieval being based on the cosine similarity of sentence embeddings [4,5]. In this paper, we do not focus on comparing different TM-MT integration methods, but rather on evaluating one NFR configuration that was shown to perform well in a previous study, using BLEU as evaluation metric [4]. The NFR system evaluated in this study is presented in more detail in Section 4.2.…”
Section: Tm-mt Integrationmentioning
confidence: 99%
See 1 more Smart Citation
“…The percentage of these matches is usually calculated using an algorithm based on edit distance or Levenshtein distance (Levenshtein, 1966). In addition Tezcan, Bulté, & Vanroy (2021) reported that fuzzy matching techniques use different approaches to estimate the degree of similarity between two sentences by calculating: the percentage of tokens (or characters) that appear in both segments potentially allowing for synonyms and paraphrase, the length of the longest matching sequence of tokens, or n-gram matching, the edit distance between segments, the most commonly used metric in CAT tools, automated MT evaluation metrics such as translation edit rate (TER), the amount of overlap in syntactic parse trees, or a more recently proposed method, the distance between continuous sentence representations.…”
Section: Introductionmentioning
confidence: 99%
“…In their recent research, Tezcan et al (2021) have proposed developing a 'neural fuzzy repair' method by using sub-word-level segmentation in fuzzy match combinations to maximise the coverage of source words. This method employs vector-based sentence similarity metrics for retrieving TM matches in combination with alignment-based features on overall translation quality.…”
Section: Tm Integration With State-of-the-art Nmtmentioning
confidence: 99%