2020
DOI: 10.1007/978-3-030-62466-8_7
|View full text |Cite
|
Sign up to set email alerts
|

RuBQ: A Russian Dataset for Question Answering over Wikidata

Abstract: The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowdassisted entity linking, autom… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 35 publications
0
7
0
Order By: Relevance
“…The annotation was performed via Yandex.Toloka ( ), a Russian crowd-sourcing platform with a high share of Russian speaking workers. Yandex.Toloka is widely used in the studies on Russian-language content, such as for annotation of semantic change ( Rodina & Kutuzov, 2020 ), question answering ( Korablinov & Braslavski, 2020 ), and toxic comments ( Smetanin, 2020b ). A depiction of the Yandex.Toloka user interface can be found in Fig.…”
Section: Sentiment Datasetmentioning
confidence: 99%
“…The annotation was performed via Yandex.Toloka ( ), a Russian crowd-sourcing platform with a high share of Russian speaking workers. Yandex.Toloka is widely used in the studies on Russian-language content, such as for annotation of semantic change ( Rodina & Kutuzov, 2020 ), question answering ( Korablinov & Braslavski, 2020 ), and toxic comments ( Smetanin, 2020b ). A depiction of the Yandex.Toloka user interface can be found in Fig.…”
Section: Sentiment Datasetmentioning
confidence: 99%
“…Other KGQA datasets are Free917 (Cai and Yates, 2013), WebQuestions (Berant et al, 2013), Com-plexQuestions (Bao et al, 2016), SimplesQuestions (Bordes et al, 2015), GraphQuestions (Su et al, 2016), WebQuestionsSP (Yih et al, 2016), 30MFactoidQA (Serban et al, 2016), ComplexWebQuestions (Talmor and Berant, 2018), PathQuestion (Zhou et al, 2018), MetaQA (Zhang et al, 2018), TempQuestions (Jia et al, 2018), TimeQuestions (Jia et al, 2021), Cron-Questions (Saxena et al, 2021), FreebaseQA (Jiang et al, 2019), Compositional Freebase Questions (CFQ) (Keysers et al, 2019), Compositional Wikidata Questions (CWQ) (Cui et al, 2021), RuBQ (Korablinov and Braslavski, 2020;Rybin et al, 2021), GrailQA (Gu et al, 2021), Event-QA (Souza Costa et al, 2020), Sim-pleDBpediaQA (Azmy et al, 2018), CLC-QuAD (Zou et al, 2021), KQA Pro (Shi et al, 2020), SimpleQues-tionsWikidata (Diefenbach et al, 2017), DBNQA (Yin et al, 2019), etc. These datasets do not fulfill our current criteria and thus are not part of the initial version of the KGQA leaderboard.…”
Section: Kgqa Datasetsmentioning
confidence: 99%
“…A wide range of KGQA benchmarks and datasets as well as analyses thereof have been created to evaluate KGQA systems for simple and complex questions over different publicly available Knowledge Graphs (KGs). This includes datasets such as WebQuestions [4] and GrailQA [14] for Freebase, LC-QuAD [32] and QALD-9 [34] for DBpedia, RuBQ [21] and CronQuestions [25] for Wikidata. However, none of them are geared towards generalization in KGQA.…”
Section: Related Workmentioning
confidence: 99%