2021
DOI: 10.48550/arxiv.2102.01672
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
45
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 43 publications
(46 citation statements)
references
References 83 publications
1
45
0
Order By: Relevance
“…Despite the advancements in deep neural networks and the demonstration of SoTA performances in NLG tasks [28,10] by a model such as DialoGPT, challenges still exist [30]. Hence, future work may involve understanding the mathematics of languages/linguistics and their relatedness.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Despite the advancements in deep neural networks and the demonstration of SoTA performances in NLG tasks [28,10] by a model such as DialoGPT, challenges still exist [30]. Hence, future work may involve understanding the mathematics of languages/linguistics and their relatedness.…”
Section: Discussionmentioning
confidence: 99%
“…Advances in deep neural networks, such as the tranformer-based architectures, have brought improvements to the field [6,20,12]. These models have demonstrated SoTA performances in natural language understanding (NLU) and Natural Language Generation (NLG) tasks [28,10].…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, [Liu et al 2020d] proposed the General Language Generation Evaluation (GLGE), a new multi-task benchmark for natural language generation. DecaNLP [McCann et al 2018] and GEM [Gehrmann et al 2021] are also well-known benchmarks for NLG tasks.…”
Section: Informativenessmentioning
confidence: 99%
“…Large, sufficiently diverse/representative, public benchmarks have spurred significant progress in tabular AutoML [14,15,20,59] and NLP [18,34,39,54]. However we are not aware of any analogous benchmarks for evaluating multimodal text/tabular ML.…”
Section: Related Workmentioning
confidence: 99%