Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1406
|View full text |Cite
|
Sign up to set email alerts
|

Style Transfer for Texts: Retrain, Report Errors, Compare with Rewrites

Abstract: This paper shows that standard assessment methodology for style transfer has several significant problems. First, the standard metrics for style accuracy and semantics preservation vary significantly on different re-runs. Therefore one has to report error margins for the obtained results. Second, starting with certain values of bilingual evaluation understudy (BLEU) between input and output and accuracy of the sentiment transfer the optimization of these two standard metrics diverge from the intuitive goal of … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(33 citation statements)
references
References 25 publications
0
32
0
1
Order By: Relevance
“…Providing a meaningful comparison of our approach to existing style transfer systems is difficult because of (1) poorly-defined automatic and human methods for measuring style transfer quality (Pang, 2019;Mir et al, 2019;Tikhonov et al, 2019), and (2) misleading (or absent) methods of aggregating three individual metrics (transfer accuracy, semantic similarity and fluency) into a single number. In this section, we describe the flaws in existing metrics and their aggregation (the latter illustrated through a naïve baseline), and we propose a new evaluation methodology to fix these issues.…”
Section: Evaluating Style Transfermentioning
confidence: 99%
“…Providing a meaningful comparison of our approach to existing style transfer systems is difficult because of (1) poorly-defined automatic and human methods for measuring style transfer quality (Pang, 2019;Mir et al, 2019;Tikhonov et al, 2019), and (2) misleading (or absent) methods of aggregating three individual metrics (transfer accuracy, semantic similarity and fluency) into a single number. In this section, we describe the flaws in existing metrics and their aggregation (the latter illustrated through a naïve baseline), and we propose a new evaluation methodology to fix these issues.…”
Section: Evaluating Style Transfermentioning
confidence: 99%
“…Whereas, the second category look to explicitly separate attributes from the content. This constraint is enforced using either adversarial training (Fu et al, 2017;Hu et al, 2017;Zhang et al, 2018;Yamshchikov et al, 2019) or MI minimisation using vCLUB-S (Cheng et al, 2020b). Traditional adversarial training is based on an encoder that aims to fool the adversary discriminator by removing attribute information from the content embedding (Elazar and Goldberg, 2018).…”
Section: Main Definitions and Related Workmentioning
confidence: 99%
“…To the best of our knowledge, a depth study of the relationship between disentangled representa-tions based either on adversarial losses solely or on vCLU B − S and quality of the generated sentences remains overlooked. Most of the previous studies have been focusing on either trade-offs between metrics computed on the generated sentences (Tikhonov et al, 2019) or performance evaluation of the disentanglement as part of (or convoluted with) more complex modules. This enhances the need to provide a fair evaluation of disentanglement methods by isolating their individual contributions (Yamshchikov et al, 2019;Cheng et al, 2020b).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Most work in this growing field has focused primarily on style transfer within English, while covering different languages has received disproportional interest. Concretely, out of 35 ST papers we reviewed, all of them report results for ST within English text, while there is just a single work covering each of the following languages: Chinese, Russian, Latvian, Estonian, and French (Shang et al, 2019;Tikhonov et al, 2019;Korotkova et al, 2019;Niu et al, 2018). Notably, even though some efforts have been made towards multilingual ST, researchers are limited to providing system outputs as a means of evaluation, and progress is hampered by the scarcity of resources for most languages.…”
Section: Introductionmentioning
confidence: 99%