Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.37
|View full text |Cite
|
Sign up to set email alerts
|

DART: Open-Domain Structured Data Record to Text Generation

Abstract: We present DART, an open domain structured DAta-Record-to-Text generation dataset with over 82k instances (DARTs). Data-to-text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectiv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
69
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 49 publications
(69 citation statements)
references
References 36 publications
0
69
0
Order By: Relevance
“…Next, we study private fine-tuning for text generation problems using the GPT-2 series of models on the End-2-End (E2E) NLG challenge (Novikova et al, 2017) and DART (Nan et al, 2021), two primary benchmarks used in recent works on non-private fine-tuning (Hu et al, 2021;. We use GPT-2-Small (117M parameters), GPT-2-Medium (345M parameters), GPT-2-Large (774M parameters), and GPT-2-XL (1.5B parameters).…”
Section: Fine-tuning For Language Understanding Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…Next, we study private fine-tuning for text generation problems using the GPT-2 series of models on the End-2-End (E2E) NLG challenge (Novikova et al, 2017) and DART (Nan et al, 2021), two primary benchmarks used in recent works on non-private fine-tuning (Hu et al, 2021;. We use GPT-2-Small (117M parameters), GPT-2-Medium (345M parameters), GPT-2-Large (774M parameters), and GPT-2-XL (1.5B parameters).…”
Section: Fine-tuning For Language Understanding Tasksmentioning
confidence: 99%
“…DART: DART was introduced as an open-domain data-to-text dataset by Nan et al (2021). The dataset consists of 62K training samples, 6.9K validation samples, and 12K test samples.…”
Section: Fine-tuning For Language Understanding Tasksmentioning
confidence: 99%
“…Fine-tuning for Graph-to-text Generation. While previous approaches (Song et al, 2018;Ribeiro et al, 2019;Cai and Lam, 2020;Schmitt et al, 2021;Zhang et al, 2020b) have shown that explicitly encoding the graph structure is beneficial, fine-tuning PLMs on linearized structured data has established a new level of performance in data-to-text generation (Nan et al, 2021;Kale, 2020;. Our work can be seen as integrating the advantage of both graph structure encoding and PLMs, using a novel adapter module.…”
Section: Related Workmentioning
confidence: 98%
“…Data-to-Text As shown in Figure 5, we fine-tune T5 (Raffel et al, 2019) on DART (Nan et al, 2021) to obtain a Data-to-Text model as the second module of the pipeline to perform surface realization of table cells (denotations in our case). We first convert the denotation prediction into the triple-set format with the following scheme: for each table cell in the highlighted region, we generate the following triple: [[TABLECONTEXT], column header, cell value], where column header is the cell's corresponding column name.…”
Section: Weakly Supervised Table Semantic Parsingmentioning
confidence: 99%
“…We use a checkpoint of TAPAS-base that is fine-tuned on WikiTableQuestions (Pasupat and Liang, 2015) to perform table semantic parsing implicitly in order to produce a set of denotations, which is then converted to a triple-set as described in 3.1. We then employ a T5-large model (Raffel et al, 2019) that goes through two fine-tuning stages: in the first stage it is fine-tuned on the downstream Data-to-Text task with DART (Nan et al, 2021); in the second stage it is further fine-tuned on ToTTo instances to adapt to the triple-set formulation we proposed. We denote this setting as Pipeline -zeroshot in Table 4.…”
Section: Experiments Setupmentioning
confidence: 99%