Automatic Post-Editing (APE) aims to correct errors in the output of a given machine translation (MT) system. Although data-driven approaches have become prevalent also in the APE task as in many other NLP tasks, there has been a lack of qualified training data due to the high cost of manual construction. eSCAPE, a synthetic APE corpus, has been widely used to alleviate the data scarcity, but it might not address genuine APE corpora's characteristic that the post-edited sentence should be a minimally edited revision of the given MT output. Therefore, we propose two new methods of synthesizing additional MT outputs by adapting back-translation to the APE task, obtaining robust enlargements of the existing synthetic APE training dataset 1 . Experimental results on the WMT English-German APE benchmarks demonstrate that our enlarged datasets are effective in improving APE performance.
Screenplay summarization is the task of extracting informative scenes from a screenplay. The screenplay contains turning point (TP) events that change the story direction and thus define the story structure decisively. Accordingly, this task can be defined as the TP identification task. We suggest using dialogue information, one attribute of screenplays, motivated by previous work that discovered that TPs have a relation with dialogues appearing in screenplays. To teach a model this characteristic, we add a dialogue feature to the input embedding. Moreover, in an attempt to improve the model architecture of previous studies, we replace LSTM with Transformer. We observed that the model can better identify TPs in a screenplay by using dialogue information and that a model adopting Transformer outperforms LSTM-based models.
Synthetic training data has been extensively used to train Automatic Post-Editing (APE) models in many recent studies because the quantity of human-created data has been considered insufficient. However, the most widely used synthetic APE dataset, eSCAPE, overlooks respecting the minimal editing property of genuine data, and this defect may have been a limiting factor for the performance of APE models. This article suggests adapting back-translation to APE to constrain edit distance, while using stochastic sampling in decoding to maintain diversity of outputs, to create a new synthetic APE dataset, RESHAPE. Our experiments show that (1) RESHAPE contains more samples resembling genuine APE data than eSCAPE does, and (2) using RESHAPE as new training data improves APE models' performance substantially over using eSCAPE.
Automatic post-editing (APE) is the study of correcting translation errors in the output of an unknown machine translation (MT) system and has been considered as a method of improving translation quality without any modification to conventional MT systems. Recently, several variants of Transformer that take both the MT output and its corresponding source sentence as inputs have been proposed for APE; and models introducing an additional attention layer into the encoder to jointly encode the MT output with its source sentence recorded a high-rank in the WMT19 APE shared task. We examine the effectiveness of such joint-encoding strategy in a controlled environment and compare four types of decoder multi-source attention strategies that have been introduced into previous APE models. The experimental results indicate that the joint-encoding strategy is effective and that taking the final encoded representation of the source sentence is the more proper strategy than taking such representation within the same encoder stack. Furthermore, among the multi-source attention strategies combined with the joint-encoding, the strategy that applies attention to the concatenated input representation and the strategy that adds up the individual attention to each input improve the quality of APE results over the strategy using the joint-encoding only.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.