Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.233
|View full text |Cite
|
Sign up to set email alerts
|

A Vietnamese Dataset for Evaluating Machine Reading Comprehension

Abstract: Over 97 million people speak Vietnamese as their native language in the world. However, there are few research studies on machine reading comprehension (MRC) for Vietnamese, the task of understanding a text and answering questions related to it. Due to the lack of benchmark datasets for Vietnamese, we present the Vietnamese Question Answering Dataset (UIT-ViQuAD), a new dataset for the low-resource language as Vietnamese to evaluate MRC models. This dataset comprises over 23,000 human-generated question-answer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
170
0
4

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 138 publications
(174 citation statements)
references
References 22 publications
0
170
0
4
Order By: Relevance
“…For the machine reading comprehension model, the Document Reader (DrQA) introduced by Chen et al [1] is a powerful model on various of machine reading comprehension corpora such as: SQuAD [11], TextWorldsQA [8], and UIT-ViSQuAD [10]. The DrQA model consists of two modules: Document Retriever and Document Reader.…”
Section: Methodologiesmentioning
confidence: 99%
See 1 more Smart Citation
“…For the machine reading comprehension model, the Document Reader (DrQA) introduced by Chen et al [1] is a powerful model on various of machine reading comprehension corpora such as: SQuAD [11], TextWorldsQA [8], and UIT-ViSQuAD [10]. The DrQA model consists of two modules: Document Retriever and Document Reader.…”
Section: Methodologiesmentioning
confidence: 99%
“…Many of MRC corpora are constructed on specific domains and open domains in English such as SQuAD [11] (extractive MRC) on Wikipedia articles, RACE [4] (multiple choices MRC) on High school students English Exams domain, and NarrativeQA [7] (abstractive MRC) on books and stories domain. For the Vietnamese language, the UIT-ViQuAD [10] (Wikipedia domain) and ViNewQA [15] (Health news domain) are two extractive MRC corpora for machine reading comprehension. Besides, the ViMMRC [9] is the multiple-choice reading comprehension corpus on the Vietnamese students' textbook for primary schools domain.…”
Section: Related Workmentioning
confidence: 99%
“…Three more languages have their versions of SQuAD [210]: French [66,126], Vietnamese [187], and Korean [150],…”
Section: Monolingual Resourcesmentioning
confidence: 99%
“…The output of Pyserini is then reranked by a T5 language model, 10 which is fine-tuned on MS MARCO, a large machine reading comprehension dataset. 18 Similarly, SLEDGE 19 uses a similar approach, but using SciBERT 13 to rerank documents.…”
Section: Background and Significancementioning
confidence: 99%