The OpenCitations Data Model

Daquino, Marilena; Peroni, Silvio; Shotton, David M.; Colavizza, Giovanni; Ghavimi, Behnam; Lauscher, Anne; Mayr, Philipp; Romanello, Matteo; Zumstein, Philipp

doi:10.48550/arxiv.2005.11981

Cited by 1 publication

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This section introduces the benchmark datasets OC-782K and AMiner-534K which are created for evaluating the LAND framework. OC-782K is a subset of the Scientometrics KG [20] which is built in compliance with the OpenCitations Data Model (OCDM) [7]. On the other hand, AMiner-534K is a KG generated from a well-established benchmark dataset 2 for AND made available by AMiner in [31].…”

Section: Creation Of the Scholarly Kgsmentioning

confidence: 99%

“…This data model contains three types of entities: fabio:Expression, which represents articles, books, conference papers, and other academic works, fabio:Journal for representing journal venues (if the related fabio:Expression is a journal article), and authors which are described as foaf:Agent. The data model is an abstraction of the OCDM [7] and is created for two reasons: i) for collecting triples only related to the entities of interest (e.g. bibliographic resources, venues, and authors), ii) create an abstract representation of Scientometrics-OC in order to perform representation learning more efficiently.…”

Section: The Oc-782k Knowledge Graphmentioning

confidence: 99%

See 1 more Smart Citation

A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals

Santini¹,

Gesese²,

Peroni³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Scholarly data is growing continuously containing information about the articles from plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available in the for of Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also lead to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: 1) Multimodal KGEs, 2) A blocking procedure, and finally, 3) Hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8-14% in terms of F 1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github (https://github.com/sntcristian/and-kge) and Zenodo (https://zenodo.org/record/ 5675787#.YcCJzL3MJTY) respectively.

show abstract

Section: Creation Of the Scholarly Kgsmentioning

confidence: 99%

Section: The Oc-782k Knowledge Graphmentioning

confidence: 99%