Proceedings of the 11th International Conference on Natural Language Generation 2018
DOI: 10.18653/v1/w18-6521
|View full text |Cite
|
Sign up to set email alerts
|

Enriching the WebNLG corpus

Abstract: This paper describes the enrichment of WebNLG corpus (Gardent et al., 2017a,b), with the aim to further extend its usefulness as a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation. We also produce a silverstandard German translation of the corpus to enable the exploitation of NLG approaches to other languages than English. The enriched corpus is publicly available 1 .

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
37
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 28 publications
(37 citation statements)
references
References 19 publications
0
37
0
Order By: Relevance
“…In contrast, while relation extraction datasets link text to a knowledge graph, the text is made up of disjoint sentences that do not provide sufficient context to train a powerful language model. Our goals are much more aligned to the data-to-text task (Ahn et al, 2016;Lebret et al, 2016;Wiseman et al, 2017;Yang et al, 2017;Gardent et al, 2017;Ferreira et al, 2018), where a small table-sized KB is provided to generate a short piece of text; we are interested in language models that dynamically decide the facts to incorporate from the knowledge graph, guided by the discourse. For these reasons we introduce the Linked WikiText-2 dataset, consisting of (approximately) the same articles appearing in the WikiText-2 language modeling corpus, but linked to the Wikidata (Vrandečić and Krötzsch, 2014) knowledge graph.…”
Section: Parameterizing the Distributionsmentioning
confidence: 99%
“…In contrast, while relation extraction datasets link text to a knowledge graph, the text is made up of disjoint sentences that do not provide sufficient context to train a powerful language model. Our goals are much more aligned to the data-to-text task (Ahn et al, 2016;Lebret et al, 2016;Wiseman et al, 2017;Yang et al, 2017;Gardent et al, 2017;Ferreira et al, 2018), where a small table-sized KB is provided to generate a short piece of text; we are interested in language models that dynamically decide the facts to incorporate from the knowledge graph, guided by the discourse. For these reasons we introduce the Linked WikiText-2 dataset, consisting of (approximately) the same articles appearing in the WikiText-2 language modeling corpus, but linked to the Wikidata (Vrandečić and Krötzsch, 2014) knowledge graph.…”
Section: Parameterizing the Distributionsmentioning
confidence: 99%
“…We used version 1.5 of the augmented WebNLG corpus (Castro Ferreira et al, 2018b) 4 to evaluate the steps of our pipeline approach. Based on its intermediate representations, we extracted goldstandards to train and evaluate the different steps.…”
Section: Datamentioning
confidence: 99%
“…We evaluated our proposal based on an enriched version (Castro Ferreira et al, 2018b) of the WebNLG corpus (Gardent et al, 2017a). The original resource is a parallel corpus with sets of RDF (Resource Description Framework) triples and their corresponding verbalizations.…”
Section: Datamentioning
confidence: 99%
“…We used an enriched version of the WebNLG corpus obtained by a delexicalization process (i.e., mapping each entity to a generic tag and later replacing their corresponding referring expressions in discourse with these tags) which was created by Castro Ferreira et al (2018b). Table 1 shows an example of a set of 4 triples and corresponding text, together with the intermediate representations obtained in the delexicalization process, such as general tags, Wikipedia IDs (entity/constant), referring expressions and the delexicalized template.…”
Section: Datamentioning
confidence: 99%
See 1 more Smart Citation