2017
DOI: 10.1515/pralin-2017-0027
|View full text |Cite
|
Sign up to set email alerts
|

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Abstract: Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…Other related works on domain adaptation include Dou et al (2019a) that adapts multi-domain NMT models with domain-aware feature embeddings, which are learned via an auxiliary language modeling task. Peris et al (2017) proposed neuralnetwork based classifiers for data selection in SMT. For more related work on data selection and domain adaptation in the context of MT, see the surveys by Eetemadi et al (2015) for SMT and more recently Chu and Wang (2018) for NMT.…”
Section: Related Workmentioning
confidence: 99%
“…Other related works on domain adaptation include Dou et al (2019a) that adapts multi-domain NMT models with domain-aware feature embeddings, which are learned via an auxiliary language modeling task. Peris et al (2017) proposed neuralnetwork based classifiers for data selection in SMT. For more related work on data selection and domain adaptation in the context of MT, see the surveys by Eetemadi et al (2015) for SMT and more recently Chu and Wang (2018) for NMT.…”
Section: Related Workmentioning
confidence: 99%
“…The main distinction is that they used neural language models for selection rather than n-gram models. , Peris et al (2017), and selected based on convolutional and bidirectional long short-term memory neural networks.…”
Section: Related Workmentioning
confidence: 99%
“…They also use n-gram (n=4) based language models for representing the sentences. Peris et al [25] used neural network-based classifiers to select data for the machine translation task. Recently, Gururangan et al [14] used unsupervised data selection for increasing training data in a lowresource scenario.…”
Section: Related Workmentioning
confidence: 99%