2012 IEEE Sixth International Conference on Semantic Computing 2012
DOI: 10.1109/icsc.2012.46
|View full text |Cite
|
Sign up to set email alerts
|

Translate Once, Translate Twice, Translate Thrice and Attribute: Identifying Authors and Machine Translation Tools in Translated Text

Abstract: In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
23
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(24 citation statements)
references
References 9 publications
1
23
0
Order By: Relevance
“…(Or more than one author: see Caliskan and Greenstadt [2012] where machine translation services are characterized by their own "styles") b) Authorship Computational approaches to these problems are called by Juola (2006) "non-traditional" (after Rudman 2005), in that they inherit from the "classical" practice of stylistic analysis only the general idea. Juola (2006) notes that, all too often, a thorough analysis of the meaning and use of stylometric features and algorithms is overlooked simply because the gathering of data and the generation of results is so easy.…”
Section: Applicationsmentioning
confidence: 99%
“…(Or more than one author: see Caliskan and Greenstadt [2012] where machine translation services are characterized by their own "styles") b) Authorship Computational approaches to these problems are called by Juola (2006) "non-traditional" (after Rudman 2005), in that they inherit from the "classical" practice of stylistic analysis only the general idea. Juola (2006) notes that, all too often, a thorough analysis of the meaning and use of stylometric features and algorithms is overlooked simply because the gathering of data and the generation of results is so easy.…”
Section: Applicationsmentioning
confidence: 99%
“…Rao and Rohatgi are among the first to address authorship anonymity by proposing using round-trip machine translation, e.g., English → Spanish → English, to obfuscate authors [24]. Other researchers apply round-trip translation, with a maximum of two intermediate languages and show that it does not provide noticeable anonymizing effect [15,13]. In contrast, we explore effects (on privacy) of increasing and/or randomizing the number of intermediate languages.…”
Section: Related Workmentioning
confidence: 99%
“…Using a translator to anonymize writing style has been attempted in prior work [13,15]. However, prior studies did not go beyond three levels of translation and did not show significant decreases in linkability.…”
Section: Translation Experimentsmentioning
confidence: 99%
“…Though many of classification algorithms have been applied on text classification, we discuss 2 algorithms that we apply to our datasets: Naive Bayes and Random forest. Although SVM is selected for classification algorithm in previous studies [5,14], we omit it for our experiments due to its computational expensiveness.…”
Section: Classification Algorithmsmentioning
confidence: 99%
“…In this section, we will compare the classification results among the translation set features from Aylin and Rachel [14], and topic model features from Suresh et al [5].…”
Section: Comparisonmentioning
confidence: 99%