Proceedings of the 3rd Workshop on Machine Reading for Question Answering 2021
DOI: 10.18653/v1/2021.mrqa-1.1
|View full text |Cite
|
Sign up to set email alerts
|

MFAQ: a Multilingual FAQ Dataset

Abstract: In this paper, we present the first multilingual FAQ dataset publicly available. We collected around 6M FAQ pairs from the web, in 21 different languages. Although this is significantly larger than existing FAQ retrieval datasets, it comes with its own challenges: duplication of content and uneven distribution of topics. We adopt a similar setup as Dense Passage Retrieval (DPR) (Karpukhin et al., 2020) and test various bi-encoders on this dataset. Our experiments reveal that a multilingual model based on XLM-R… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…For Korean, we select a question-answering corpus from AI Hub, 2 which includes civil complaints and replies from public organizations. For Turkish, we identify FAQ (i.e., Frequently Asked Questions) and CQA (i.e., Community Question Answering) datasets from MFAQ (De Bruyn et al, 2021). All of these corpora are used in accordance with their licenses.…”
Section: Distant Supervision For New Languagesmentioning
confidence: 99%
“…For Korean, we select a question-answering corpus from AI Hub, 2 which includes civil complaints and replies from public organizations. For Turkish, we identify FAQ (i.e., Frequently Asked Questions) and CQA (i.e., Community Question Answering) datasets from MFAQ (De Bruyn et al, 2021). All of these corpora are used in accordance with their licenses.…”
Section: Distant Supervision For New Languagesmentioning
confidence: 99%
“…Notably, other studies [41][42][43] have explored the possibility of knowledge selection from a small set of knowledge, without employing a retrieval step or search engine, as was performed in this study. It is worth mentioning that the utilization of search engines for machine translation tasks has been shown to produce effective results [44].…”
Section: Related Workmentioning
confidence: 99%
“…In other words, these models are trained based on the data collected at the time of their creation, and the knowledge acquired is static and not adaptive to changes. Other studies [39][40][41] have explored the possibility of knowledge selection from a small set of knowledge, without employing a retrieval step or search engine, as was done in this study. Notably, using search engines for machine translation tasks has been shown to produce effective results [42].…”
Section: Introductionmentioning
confidence: 99%