Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1137
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping Generators from Noisy Data

Abstract: A core step in statistical data-to-text generation concerns learning correspondences between structured data representations (e.g., facts in a database) and associated texts. In this paper we aim to bootstrap generators from large scale datasets where the data (e.g., DBPedia facts) and related texts (e.g., Wikipedia abstracts) are loosely aligned. We tackle this challenging task by introducing a special-purpose content selection mechanism. 1 We use multi-instance learning to automatically discover corresponden… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 29 publications
(23 citation statements)
references
References 32 publications
0
23
0
Order By: Relevance
“…Other approaches focusing on micro planning (Puduppully et al, 2019a;Moryossef et al, 2019) might be better tailored for generating shorter texts. There has been a surge of datasets recently focusing on singleparagraph outputs and the task of content selection such as E2E (Novikova et al, 2017), WebNLG (Gardent et al, 2017), and WikiBio (Lebret et al, 2016;Perez-Beltrachini and Lapata, 2018). We note that in our model content selection takes place during macro planning and text generation.…”
Section: Human-based Evaluationmentioning
confidence: 99%
“…Other approaches focusing on micro planning (Puduppully et al, 2019a;Moryossef et al, 2019) might be better tailored for generating shorter texts. There has been a surge of datasets recently focusing on singleparagraph outputs and the task of content selection such as E2E (Novikova et al, 2017), WebNLG (Gardent et al, 2017), and WikiBio (Lebret et al, 2016;Perez-Beltrachini and Lapata, 2018). We note that in our model content selection takes place during macro planning and text generation.…”
Section: Human-based Evaluationmentioning
confidence: 99%
“…Several models have been proposed in the last few years for data-to-text generation (Mei et al 2016;Lebret et al 2016;Wiseman et al 2017, inter alia) based on the very successful encoderdecoder architecture (Bahdanau et al, 2015). Various attempts have also been made to improve these models, e.g., by adding content selection (Perez-Beltrachini and Lapata, 2018) and content planning (Puduppully et al, 2019) mechanisms. However, we are not aware of any prior work in this area which explicitly handles entities and their generation in discourse context.…”
Section: Related Workmentioning
confidence: 99%
“…Modern approaches to data-to-text generation have shown great promise (Lebret et al, 2016;Mei et al, 2016;Perez-Beltrachini and Lapata, 2018;Puduppully et al, 2019;Wiseman et al, 2017) thanks to the use of large-scale datasets and neural network models which are trained end-toend based on the very successful encoder-decoder architecture (Bahdanau et al, 2015). In contrast to traditional methods which typically implement pipeline-style architectures (Reiter and Dale, 2000) with modules devoted to individual generation components (e.g., content selection or lexical choice), neural models have no special-purpose mechanisms for ensuring how to best generate a text.…”
Section: Introductionmentioning
confidence: 99%
“…Unfortunately, these techniques are not suited to more realistic and noisier datasets, as for instance WikiBio [26] or Ro-toWire [60]. On these benchmarks, several techniques have been proposed, such as reconstruction loss terms [60,59] or Reinforcement Learning (RL) based methods [41,30,45]. These approaches suffer however from different issues: (1) the reconstruction loss relies on an hypothesis of one-to-one alignment between source and target which does not fit with content selection in DTG; (2) RL-trained models are based on instance-level rewards (e.g.…”
Section: Introductionmentioning
confidence: 99%