2022
DOI: 10.48550/arxiv.2202.01764
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

Abstract: Question Answering (QA) is a task in which a machine understands a given document and a question to find an answer. Despite impressive progress in the NLP area, QA is still a challenging problem, especially for non-English languages due to the lack of annotated datasets. In this paper, we present the Japanese Question Answering Dataset, JaQuAD, which is annotated by humans. JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles. We finetuned a baseline model which achieves 78… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…Based on our guidelines for building QA systems with the Retriever-Reader-Selector mechanism in this paper, QA systems for other languages (especially low-resource languages) can be easily adapted and re-implemented as baseline QA systems. This proposed system can be extended to different datasets on other languages such as KorQuAD (for Korean) [12], SberQuAD (for Russian) [9], JaQuAD (for Japanese) [21], and FQuAD (for French) [8] in the near future.…”
Section: Results Analysismentioning
confidence: 99%
“…Based on our guidelines for building QA systems with the Retriever-Reader-Selector mechanism in this paper, QA systems for other languages (especially low-resource languages) can be easily adapted and re-implemented as baseline QA systems. This proposed system can be extended to different datasets on other languages such as KorQuAD (for Korean) [12], SberQuAD (for Russian) [9], JaQuAD (for Japanese) [21], and FQuAD (for French) [8] in the near future.…”
Section: Results Analysismentioning
confidence: 99%
“…like DuReader [11] for Chinese, JaQuAD [28] for Japanese, KorQuAD [16] for Korean, and ViQuAD [13,19] for Vietnamese.…”
Section: Existing Datasets and Methods For Visual Question Answeringmentioning
confidence: 99%
“…In order to test the efficacy of VT, we consider two generation tasks, question answering (QA) and question generation (QG), and two classification tasks, sentiment analysis and natural language inference (NLI). As the datasets for QA, we use SQuAD (Rajpurkar et al, 2016) (English), Spanish SQuAD (Casimiro Pio et al, 2019) (Spanish), FQuAD (d'Hoffschmidt et al, 2020 (French), Italian SQuAD (Croce et al, 2018) (Italian), JAQuAD (So et al, 2022) (Japanese), Ko-rQuAD (Lim et al, 2019) (Korean), and SberQuAd (Efimov et al, 2020) (Russian). For QG, we use the same datasets adapted for QG via QG-Bench (Ushio et al, 2022).…”
Section: Experimental Settingmentioning
confidence: 99%