Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.531
|View full text |Cite
|
Sign up to set email alerts
|

How Do Seq2Seq Models Perform on End-to-End Data-to-Text Generation?

Abstract: With the rapid development of deep learning, Seq2Seq paradigm has become prevalent for end-to-end data-to-text generation, and the BLEU scores have been increasing in recent years. However, it is widely recognized that there is still a gap between the quality of the texts generated by models and the texts written by human. In order to better understand the ability of Seq2Seq models, evaluate their performance and analyze the results, we choose to use Multidimensional Quality Metric(MQM) to evaluate several rep… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…When generating text for fictional data, the most common error was instead information missing from the generated text. Yin and Wan (2022) showed that the large pretrained T5 and BART models performed practically no addition or duplication errors on any of the KG datasets they were evaluated on, but that the rates of hallucinations (intrinsic and extrinsic inaccuracy) rose with the amount of pretraining -our results on ChatGPT do not follow this trend, as we see both types of errors on our factual dataset.…”
Section: Discussionmentioning
confidence: 51%
See 3 more Smart Citations
“…When generating text for fictional data, the most common error was instead information missing from the generated text. Yin and Wan (2022) showed that the large pretrained T5 and BART models performed practically no addition or duplication errors on any of the KG datasets they were evaluated on, but that the rates of hallucinations (intrinsic and extrinsic inaccuracy) rose with the amount of pretraining -our results on ChatGPT do not follow this trend, as we see both types of errors on our factual dataset.…”
Section: Discussionmentioning
confidence: 51%
“…For each of the seven triples in the prompt, annotators also had to check one of three exclusive options; the text states this fact (henceforth present), the text does not say anything about this (absent) or the text states something else that actively goes against this fact (hallucinated). Absent corresponds to omission in Yin and Wan (2022), with hallucinated corresponding to inaccuracy intrinsic, inaccuracy extrinsic and positive-negative aspect (Yin and Wan, 2022).…”
Section: Results For Triple Coveragementioning
confidence: 99%
See 2 more Smart Citations
“…17 This finding is intuitive as these metrics were not designed to evaluate the correctness of reasoning. Nevertheless, they are used to assess outputs in recent data-to-text approaches (e.g.,Mehta et al, 2022;Yin and Wan, 2022; Anders et al, 2022), although most point out the limitations of such automatic assessments.…”
mentioning
confidence: 99%