2022
DOI: 10.48550/arxiv.2211.00142
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TaTa: A Multilingual Table-to-Text Dataset for African Languages

Abstract: Existing data-to-text generation datasets are mostly limited to English. To address this lack of data, we create Table-to-Text in African languages (TATA), the first large multilingual table-to-text dataset with a focus on African languages. We created TATA by transcribing figures and accompanying text in bilingual reports by the Demographic and Health Surveys Program, followed by professional translation to make the dataset fully parallel. TATA includes 8,700 examples in nine languages including four African … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…In parallel with efforts to include more low-resource languages in NLP research (Costajussà et al, 2022;Ruder, 2020), demand for NLP that targets African languages, which represent more than 30% of the world's spoken languages (Ogueji et al, 2021) is growing. This has resulted in the creation of publicly available multilingual datasets targeting African languages for a variety of NLP tasks such as sentiment analysis (Muhammad et al, 2023;Shode et al, 2022), language identification (Adebara et al, 2022), datato-text generation (Gehrmann et al, 2022), topic classification (Adelani et al, 2023;Hedderich et al, 2020), machine translation (Adelani et al, 2022a;Nekoto et al, 2020), and NER (Eiselen, 2016;Adelani et al, 2021Adelani et al, , 2022b.…”
Section: Related Workmentioning
confidence: 99%
“…In parallel with efforts to include more low-resource languages in NLP research (Costajussà et al, 2022;Ruder, 2020), demand for NLP that targets African languages, which represent more than 30% of the world's spoken languages (Ogueji et al, 2021) is growing. This has resulted in the creation of publicly available multilingual datasets targeting African languages for a variety of NLP tasks such as sentiment analysis (Muhammad et al, 2023;Shode et al, 2022), language identification (Adebara et al, 2022), datato-text generation (Gehrmann et al, 2022), topic classification (Adelani et al, 2023;Hedderich et al, 2020), machine translation (Adelani et al, 2022a;Nekoto et al, 2020), and NER (Eiselen, 2016;Adelani et al, 2021Adelani et al, , 2022b.…”
Section: Related Workmentioning
confidence: 99%