JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

So, ByungHoon; Byun, Kyuhong; Kang, Kyung‐Won; Cho, Seong Jin

doi:10.48550/arxiv.2202.01764

Cited by 4 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on our guidelines for building QA systems with the Retriever-Reader-Selector mechanism in this paper, QA systems for other languages (especially low-resource languages) can be easily adapted and re-implemented as baseline QA systems. This proposed system can be extended to different datasets on other languages such as KorQuAD (for Korean) [12], SberQuAD (for Russian) [9], JaQuAD (for Japanese) [21], and FQuAD (for French) [8] in the near future.…”

Section: Results Analysismentioning

confidence: 99%

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Nguyen¹,

Do²,

Nguyen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems, on the other hand, especially in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.

show abstract

Section: Results Analysismentioning

confidence: 99%

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Nguyen¹,

Do²,

Nguyen³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…like DuReader [11] for Chinese, JaQuAD [28] for Japanese, KorQuAD [16] for Korean, and ViQuAD [13,19] for Vietnamese.…”

Section: Existing Datasets and Methods For Visual Question Answeringmentioning

confidence: 99%

Integrating Image Features With Convolutional Sequence-to-Sequence Network for Multilingual Visual Question Answering

Thai,

Luu

2024

JCC

View full text Add to dashboard Cite

Visual question answering is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease, but it is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual question answering task in the multilingual domain on a newly released dataset UIT-EVJVQA, in which the questions and answers are written in three different languages: English, Vietnamese, and Japanese. We approached the challenge as a sequence-to-sequence learning task, in which we integrated hints from pre-trained state-of-the-art VQA models and image features with a convolutional sequence-to-sequence network to generate the desired answers. Our results obtained up to 0.3442 by F1 score on the public test set and 0.4210 on the private test set.

show abstract

“…In order to test the efficacy of VT, we consider two generation tasks, question answering (QA) and question generation (QG), and two classification tasks, sentiment analysis and natural language inference (NLI). As the datasets for QA, we use SQuAD (Rajpurkar et al, 2016) (English), Spanish SQuAD (Casimiro Pio et al, 2019) (Spanish), FQuAD (d'Hoffschmidt et al, 2020 (French), Italian SQuAD (Croce et al, 2018) (Italian), JAQuAD (So et al, 2022) (Japanese), Ko-rQuAD (Lim et al, 2019) (Korean), and SberQuAd (Efimov et al, 2020) (Russian). For QG, we use the same datasets adapted for QG via QG-Bench (Ushio et al, 2022).…”

Section: Experimental Settingmentioning

confidence: 99%

Efficient Multilingual Language Model Compression through Vocabulary Trimming

Ushio,

Zhou,

Camacho-Collados

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Multilingual language models (LMs) have become a powerful tool in NLP, especially for non-English languages. Nevertheless, model parameters of multilingual LMs remain large due to the larger embedding matrix of the vocabulary covering tokens in different languages. Instead, monolingual LMs can be trained in a target language with the language-specific vocabulary only. In this paper, we propose vocabulary-trimming (VT), a method to reduce a multilingual LM vocabulary to a target language by deleting potentially irrelevant tokens from its vocabulary. In theory, VT can compress any existing multilingual LM to any language covered by the original model. In our experiments, we show that VT can retain the original performance of the multilingual LM, while being considerably smaller in size than the original multilingual LM. The evaluation is performed over four NLP tasks (two generative and two classification tasks) among four widely used multilingual LMs in seven languages. The results show that this methodology can keep the best of both monolingual and multilingual worlds by keeping a small size as monolingual models without the need for specifically retraining them, and can even help limit potentially harmful social biases.

show abstract

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

Cited by 4 publications

References 15 publications

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Integrating Image Features With Convolutional Sequence-to-Sequence Network for Multilingual Visual Question Answering

Efficient Multilingual Language Model Compression through Vocabulary Trimming

Contact Info

Product

Resources

About