Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.270
|View full text |Cite
|
Sign up to set email alerts
|

TABBIE: Pretrained Representations of Tabular Data

Abstract: Existing work on tabular representationlearning jointly models tables and associated text using self-supervised objective functions derived from pretrained language models such as BERT. While this joint pretraining improves tasks involving paired tables and text (e.g., answering questions about tables), we show that it underperforms on tasks that operate over tables without any associated text (e.g., populating missing cells). We devise a simple pretraining objective (corrupt cell detection) that learns exclus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
48
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(48 citation statements)
references
References 29 publications
0
48
0
Order By: Relevance
“…Such approaches can be applied on other datasets such as WikiTableQA (Pasupat and Liang, 2015), TabFact (Chen et al, 2019), Hy-bridQA (Chen et al, 2020b;Zayats et al, 2021;Oguz et al, 2020), OpenTableQA (Chen et al, 2021), ToTTo (Parikh et al, 2020, Turing Tables (Yoran et al, 2021) i.e. table to text generation tasks, LogicTable and (Chen et al, 2020a) and recently proposed tabular reasoning models proposed in TAPAS (Müller et al, 2021;Herzig et al, 2020), TaBERT (Yin et al, 2020), TABBIE (Iida et al, 2021), TabGCN (Pramanick and Bhattacharya, 2021) and RCI (Glass et al, 2021).…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Such approaches can be applied on other datasets such as WikiTableQA (Pasupat and Liang, 2015), TabFact (Chen et al, 2019), Hy-bridQA (Chen et al, 2020b;Zayats et al, 2021;Oguz et al, 2020), OpenTableQA (Chen et al, 2021), ToTTo (Parikh et al, 2020, Turing Tables (Yoran et al, 2021) i.e. table to text generation tasks, LogicTable and (Chen et al, 2020a) and recently proposed tabular reasoning models proposed in TAPAS (Müller et al, 2021;Herzig et al, 2020), TaBERT (Yin et al, 2020), TABBIE (Iida et al, 2021), TabGCN (Pramanick and Bhattacharya, 2021) and RCI (Glass et al, 2021).…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…On the other hand, methods built on dynamic contextual embedding have also been explored. Many of them are tokenized using WrodPiece, encoded via the token vocabulary, and initialized by BERT [30], such as TaBERT [101], TaPas [51], MATE [39], StruG [29], TableFormer [6], TUTA [95], ForTaP [23], and TABBIE [55]. TURL [28] was initialized by TinyBERT [56] and additionally used an entity vocabulary.…”
Section: Input Featurization and Embeddingmentioning
confidence: 99%
“…Most works such as Tapas [51], MATE [39], TableFormer [6], TUTA [95] and TURL [28] performs in this way. TABBIE [55] linearizes tables by rows and columns separately. TableGPT [46] distinctly adapts a template-based table serialization way on relatively simple tables.…”
Section: Tabular Sequence Serializationmentioning
confidence: 99%
See 1 more Smart Citation
“…Most prevalent approaches are variants of pre-trained language models such as BERT and RoBERTa (Liu et al, 2019b). More recently, self-supervised pre-training has also shown promising results on modalities other than plain text, such as tables (Herzig et al, 2020;Deng et al, 2020;Iida et al, 2021), knowledge bases (Zhang et al, 2019;Peters et al, 2019) and image-text (Su et al, 2020). Meanwhile, there has also been work that uses pre-training to accommodate specific needs of downstream NLP tasks, such as open-domain retrieval (Guu et al, 2020), representing and predicting spans of text (Joshi et al, 2020) and semantic parsing Deng et al, 2021).…”
Section: Related Workmentioning
confidence: 99%