A Study of the Quality of Wikidata

Shenoy, Kartik; Ilievski, Filip; Garijo, Daniel; Schwabe, Daniel; Szekely, Pedro

doi:10.48550/arxiv.2107.00156

Cited by 1 publication

(1 citation statement)

References 18 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Namely, when an editor introduces a new Qnode, it is useful to have metrics which can detect very similar existing entities and ask the editor to confirm that the new entity is different from the most similar existing ones [1]. This procedure would help to avoid introducing duplicates in Wikidata, which is a key challenge today, considering that millions of redirects have been introduced in Wikidata since its inception [9]. At the same time, similarity methods could be run over the current set of entities in Wikidata to detect potentially existing duplicates, which can be validated by an editor before their merging.…”

Section: Similarity In Downstream Tasksmentioning

confidence: 99%

User-friendly Comparison of Similarity Algorithms on Wikidata

Ilievski¹,

Szekely²,

Satyukov³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

While the similarity between two concept words has been evaluated and studied for decades, much less attention has been devoted to algorithms that can compute the similarity of nodes in very large knowledge graphs, like Wikidata. To facilitate investigations and headto-head comparisons of similarity algorithms on Wikidata, we present a user-friendly interface that allows flexible computation of similarity between Qnodes in Wikidata. At present, the similarity interface supports four algorithms, based on: graph embeddings (TransE, ComplEx), text embeddings (BERT), and class-based similarity. We demonstrate the behavior of the algorithms on representative examples about semantically similar, related, and entirely unrelated entity pairs. To support anticipated applications that require efficient similarity computations, like entity linking and recommendation, we also provide a REST API that can compute most similar neighbors for any Qnode in Wikidata.

show abstract

Section: Similarity In Downstream Tasksmentioning

confidence: 99%