Proceedings of the First Workshop on Multilingual Surface Realisation 2018
DOI: 10.18653/v1/w18-3606
|View full text |Cite
|
Sign up to set email alerts
|

Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

Abstract: This work presents a new state of the art in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 S… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 17 publications
(30 citation statements)
references
References 20 publications
0
30
0
Order By: Relevance
“…Konstas et al (2017) achieve strong results on the AMR-to-text task by using data expansion and anonymising data entities while Cao and Clark (2019) additionally leverages syntactic information to improve performance. On the deep SR data, (Elder and Hokamp, 2018) uses data expansion and a factored S2S model. Graph-to-sequence models have also been proposed using various graph encoders and testing on different datasets Marcheggiani and Perez-Beltrachini 2018 Our approach is closest to the S2S model used by Elder and Hokamp (2018) in that it uses a factored S2S model to create rich node embeddings capturing the structure of the graph.…”
Section: Related Workmentioning
confidence: 99%
“…Konstas et al (2017) achieve strong results on the AMR-to-text task by using data expansion and anonymising data entities while Cao and Clark (2019) additionally leverages syntactic information to improve performance. On the deep SR data, (Elder and Hokamp, 2018) uses data expansion and a factored S2S model. Graph-to-sequence models have also been proposed using various graph encoders and testing on different datasets Marcheggiani and Perez-Beltrachini 2018 Our approach is closest to the S2S model used by Elder and Hokamp (2018) in that it uses a factored S2S model to create rich node embeddings capturing the structure of the graph.…”
Section: Related Workmentioning
confidence: 99%
“…DDM is a general principle of tree ordering based on Head Proximity (Rijkhoff, 1986), Early Immediate Constituents (Hawkins, 1994), Dependency Locality Theory (Gibson, 2000), and Minimize Domains (Hawkins, 2004), among others. Submissions to SR '18, the first multilingual shared task, are generally based on sequence-tosequence machine translation (Elder and Hokamp, 2018;Sobrevilla Cabezudo and Pardo, 2018), binary classification (Castro Ferreira et al, 2018;Puzikov and Gurevych, 2018;King and White, 2018;Madsack et al, 2018), or probabilistic ngram language models (Singh et al, 2018).…”
Section: Linearizingmentioning
confidence: 99%
“…The two best-performing approaches in the task of generating sentences from dependency trees have been feature-based incremental text generation (Bohnet et al, 2010;Liu et al, 2015;Puduppully et al, 2016;King and White, 2018) and 2 http://universaldependencies.org/ techniques performing more global input-output mapping (Castro Ferreira et al, 2018;Elder and Hokamp, 2018). The former approaches traverse the input tree, encode nodes using sparse manually defined feature sets as input representations and generate a sentence by extending a candidate hypothesis with an input word that has the highest score among other input words that have not yet been processed.…”
Section: Related Workmentioning
confidence: 99%
“…Another prominent approach is using graph-to-text neural networks (Song et al, 2018;Trisedya et al, 2018). These methods have shown good results across various tasks, but in the context of surface realization they produced somewhat mixed results: the former ones were successfully used only when being trained on large amounts of data (Elder and Hokamp, 2018), while the latter ones have been only evaluated on the SR'11 Deep Track data and, while performing better than RNN-type encoders, fell short behind feature-based methods (Marcheg- giani and Perez-Beltrachini, 2018).…”
Section: Related Workmentioning
confidence: 99%