Proceedings of the 12th International Conference on Natural Language Generation 2019
DOI: 10.18653/v1/w19-8670
|View full text |Cite
|
Sign up to set email alerts
|

Neural Generation for Czech: Data and Baselines

Abstract: We present the first dataset targeted at end-toend NLG in Czech in the restaurant domain, along with several strong baseline models using the sequence-to-sequence approach. While non-English NLG is under-explored in general, Czech, as a morphologically rich language, makes the task even harder: Since Czech requires inflecting named entities, delexicalization or copy mechanisms do not work out-ofthe-box and lexicalizing the generated outputs is non-trivial.In our experiments, we present two different approaches… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 65 publications
0
8
0
Order By: Relevance
“…Communicative Goal Language(s) Size Input Type CommonGEN Produce a likely sentence which mentions all of the source concepts. en 67k Concept Set Czech Restaurant (Dušek and Jurčíček, 2019) Produce a text expressing the given intent and covering the specified attributes. (NLU) tasks.…”
Section: Datasetmentioning
confidence: 99%
“…Communicative Goal Language(s) Size Input Type CommonGEN Produce a likely sentence which mentions all of the source concepts. en 67k Concept Set Czech Restaurant (Dušek and Jurčíček, 2019) Produce a text expressing the given intent and covering the specified attributes. (NLU) tasks.…”
Section: Datasetmentioning
confidence: 99%
“…Training data for NLG in languages other than English is still very limited: there are small datasets in Korean (Chen et al, 2010), Spanish (García-Méndez et al, 2019), and Czech (Dušek and Jurčíček, 2019). There exist also structured data-to-text datasets for German and French (Nema et al, 2018) and image-todescription datasets in Chinese (Li et al, 2016c) and Dutch (van Miltenburg et al, 2017, as well as cross-lingual English-German data (Elliott et al, 2016).…”
Section: Available Datasetsmentioning
confidence: 99%
“…Most recently, Gehrmann et al [22] proposed GEM, a benchmark specifically for tasks requiring Natural Language Generation. GEM encompasses 4 tasks through 11 datasets: Summarization [72,49,19], Structure To Text [20,40,17,57,16,54], Dialogue [63], and Simplification [25,85,1]. As noted in the introduction, evaluation of generated language relies not only on tasks, but also on automatic metrics (or humans).…”
Section: Benchmarksmentioning
confidence: 99%