Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2016
DOI: 10.18653/v1/n16-1110
|View full text |Cite
|
Sign up to set email alerts
|

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification

Abstract: This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense feature vector representations combined with a supervised machine learning approach. The results show state-of-the-art pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(28 citation statements)
references
References 28 publications
0
27
0
1
Order By: Relevance
“…Their main aim is to show that the usage of translation corpora in machine translation should be treated with caution, as human translations do not necessarily correspond to the quality standards that non-translated texts have. Rubino et al (2016) use features derived from machine translation quality estimation to clas-sify translations and non-translations motivating their work by the fact that automatic distinction between originals and machine translations was shown to correlate with the quality of the machine translated texts (Aharoni et al, 2014). However, their data does not contain human quality evaluation.…”
Section: Translation Features and Quality Estimationmentioning
confidence: 99%
“…Their main aim is to show that the usage of translation corpora in machine translation should be treated with caution, as human translations do not necessarily correspond to the quality standards that non-translated texts have. Rubino et al (2016) use features derived from machine translation quality estimation to clas-sify translations and non-translations motivating their work by the fact that automatic distinction between originals and machine translations was shown to correlate with the quality of the machine translated texts (Aharoni et al, 2014). However, their data does not contain human quality evaluation.…”
Section: Translation Features and Quality Estimationmentioning
confidence: 99%
“…Translationese seems to affect the semantic as well as the structural level of text, but much of its effects can be seen in syntax and grammar (Santos, 1995;Puurtinen, 2003). An interesting aspect of translationese is that, while it is somewhat difficult to detect for the human eye (Tirkkonen-Condit, 2002), it can be machine learned with high accuracy (Baroni and Bernardini, 2006;Rubino et al, 2016). Many ways to automatically detect translationese have been devised, both with respect to textual translations and simultaneous interpreting (Baroni and Bernardini, 2006;Ilisei et al, 2010;Popescu, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…Such corpora are also helpful as a source of data for teaching material design. There are but few attempts to approach learner translator texts as a manifestation of some variety of translational language or third code (Frawley, 1984;Rubino et al, 2016). If we can pinpoint quantitative tendencies in learner translator's output that run counter to professional translations and non-translations in the target language and understand why they occur, we can target them in the curriculum and raise learner's awareness of real-life issues with target text quality.…”
Section: Related Work: Translationese and Learner Translator Corporamentioning
confidence: 99%
“…Granger and Rayson (1998) suggested that "one way of characterizing a language variety is by drawing up a word category" (p. 121). Rayson et al (2008) developed an implementation of a corpus comparison technique known as keyword analysis, where keywords were understood as words statistically characteristic of one corpus as compared to the other. Explaining the method, the authors stated that it "can be used to discover key words in the corpora, which differentiate one corpus from another; for example, to determine significant patterns of over-or under-use" (p.2).…”
Section: Frequency Distribution Of Word Classesmentioning
confidence: 99%