Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.348
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Domain Terminology into Neural Machine Translation

Abstract: This paper extends existing work on terminology integration into Neural Machine Translation, a common industrial practice to dynamically adapt translation to a specific domain. Our method, based on the use of placeholders complemented with morphosyntactic annotation, efficiently taps into the ability of the neural network to deal with symbolic knowledge to surpass the surface generalization shown by alternative techniques. We compare our approach to state-of-the-art systems and benchmark them through a well-de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 23 publications
0
8
0
3
Order By: Relevance
“…Previous work has used sentence tags to convey domain (Kobus et al, 2017;Britz et al, 2017), speaker gender (Vanmassenhove et al, 2018), or target language formality (Sennrich et al, 2016a;Feely et al, 2019). Multiple tags can be used throughout source and target sentences, for example indicating linguistic features (Sennrich & Haddow, 2016;García-Martínez et al, 2016;Aharoni & Goldberg, 2017;Saunders et al, 2018) or custom terminology use (Dinu et al, 2019;Michon et al, 2020). Tags are usually incorporated throughout training, implicitly requiring the availability of reliable tags for large training datasets, although new tags can also be introduced during model adaptation .…”
Section: Tagsmentioning
confidence: 99%
“…Previous work has used sentence tags to convey domain (Kobus et al, 2017;Britz et al, 2017), speaker gender (Vanmassenhove et al, 2018), or target language formality (Sennrich et al, 2016a;Feely et al, 2019). Multiple tags can be used throughout source and target sentences, for example indicating linguistic features (Sennrich & Haddow, 2016;García-Martínez et al, 2016;Aharoni & Goldberg, 2017;Saunders et al, 2018) or custom terminology use (Dinu et al, 2019;Michon et al, 2020). Tags are usually incorporated throughout training, implicitly requiring the availability of reliable tags for large training datasets, although new tags can also be introduced during model adaptation .…”
Section: Tagsmentioning
confidence: 99%
“…In the development of the neural machine translation (NMT), significant studies proposed methods to integrate external specialized domain terminologies. It can be roughly divided into the following three categories [20]: 1). Placeholders.…”
Section: Domain Terminology Injectionmentioning
confidence: 99%
“…Thus, it can be argued that in-training approaches are inferior to the constrained decoding methods in terms of straightforward terminology integration and, indeed, Dinu et al (2019) report the terminology usage rate 6-9% less than the constrained decoding method. To ensure the appearance of terms in the output, Michon et al (2020) use placeholders with the help of morphosyntactic annotations. Even though the approach is effective for choosing a correctly inflected form, it depends on the availability and performance of morphological analysers both in source and target languages.…”
Section: Related Workmentioning
confidence: 99%