An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing

Schmitt, Martin; Sharifzadeh, Sahand; Tresp, Volker; Schütze, Hinrich

doi:10.18653/v1/2020.emnlp-main.577

Cited by 29 publications

(35 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dataset Overview For our two versions of datasets, GenWiki FINE and GenWiki FULL , we summarize the overall statistics of our GenWiki dataset in Table 3. We compare it with WebNLG, which also meets all three basic requirements, and has been used for previous unsupervised models (Guo et al, 2020;Schmitt et al, 2020 We can see from Table 3 that our dataset has significantly more data than the human-annotated WebNLG, and can be a better dataset for unsupervised learning. Our GenWiki FINE contains 757K ex-amples with about 20M tokens, and GenWiki FULL contains 1.3M samples with about 30M tokens.…”

Section: Discussionmentioning

confidence: 99%

“…Some more specific constraints require, for example, the text corpus to have entity annotations. The reason is that recent unsupervised learning models (Guo et al, 2020;Schmitt et al, 2020) use cycle training of two tasks: graph-to-text, and text-to-graph, which is simplified to relation extraction given entities. This simplification requires the unsupervised text corpus to have entity annotations in text.…”

Section: Desideratamentioning

confidence: 99%

“…2. To fit for recent unsupervised models (Guo et al, 2020;Schmitt et al, 2020), the text corpus should contain entity annotations. 3.…”

Section: Desideratamentioning

confidence: 99%

“…To overcome the lack of labelled data and difficulty in domain adaptation, unsupervised data-to-text generation has emerged as an active research field recently (Freitag and Roy, 2018;Schmitt et al, 2020;Guo et al, 2020). However, the progress of this line of research is slowed down due to the lack of largescale unsupervised datasets.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, there is a rising trend of unsupervised approaches (Freitag and Roy, 2018;Schmitt et al, 2020;Guo et al, 2020). Unsupervised data-to-text models (Konstas and Lapata, 2012) are proposed in response to the lack of data-text pairs in many domains, similar to the emergence of unsupervised machine translation that addresses lack of data low-resource language pairs.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Jin

Guo

Qiu

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs. 1 * Equal contribution. † Work done during internship at Amazon Shanghai AI Lab.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Desideratamentioning

confidence: 99%

“…2. To fit for recent unsupervised models (Guo et al, 2020;Schmitt et al, 2020), the text corpus should contain entity annotations. 3.…”

Section: Desideratamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%