Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

Xu, Shuqi; Lü, Linyuan; Medo, Matúš

doi:10.1016/j.joi.2019.101005

Cited by 29 publications

(35 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If a ground truth set is available, a comparative assessment on real data is possible. This can be made more robust by considering multiple real datasets and multiple ground truth sets as done recently in (Xu et al, 2020) to compare ranking metrics for citation data. If a ground truth set is not available but a credible model for a given system exists, an assessment using synthetic data (as we have used here) is a practical alternative.…”

Section: Discussionmentioning

confidence: 99%

Limits of PageRank-based ranking methods in sports data

Zhou¹,

Wang²,

Zhang³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

While PageRank has been extensively used to rank sport tournament participants (teams or individuals), its superiority over simpler ranking methods has been never clearly demonstrated. We use sports results from 18 major leagues to calibrate a state-of-art model for synthetic sports results. Model data are then used to assess the ranking performance of PageRank in a controlled setting. We find that PageRank outperforms the benchmark ranking by the number of wins only when a small fraction of all games have been played. Increased randomness in the data, such as intrinsic randomness of outcomes or advantage of home teams, further reduces the range of PageRank's superiority. We propose a new PageRank variant which outperforms PageRank in all evaluated settings, yet shares its sensitivity to increased randomness in the data. Our main findings are confirmed by evaluating the ranking algorithms on real data. Our work demonstrates the danger of using novel metrics and algorithms without considering their limits of applicability.

show abstract

Section: Discussionmentioning

confidence: 99%

Limits of PageRank-based ranking methods in sports data

Zhou¹,

Wang²,

Zhang³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…LeaderRank is further applied to identify the influential nodes in complex products and systems (Li et al, 2019); in power grids (Zhou et al, 2019); in manufacturing services (Wu et al, 2019). Notably, in the field of "Library & Information Science", Xu et al (2020) found that LeaderRank had the best performance in ranking science and technology citation data, compared with other 17 network-based metrics. However, it is noteworthy that all the previous variants of PageRank, including LeaderRank, only consider the topological features of nodes while ignoring other non-topological features, especially the spatial features that are very important to the academic performance and impact.…”

Section: Pagerank and Leaderrankmentioning

confidence: 99%

Characterizing research leadership on geographically weighted collaboration network

Zhang

2021

Scientometrics

View full text Add to dashboard Cite

Research collaborations, especially long-distance and international collaborations, have become increasingly prevalent worldwide. Recent studies highlighted the significant role of research leadership in collaborations. However, existing measures of the research leadership do not take into account the intensity of leadership in the co-authorship network. More importantly, the spatial features, which influence the collaboration patterns and research outcomes, have not been incorporated in measuring the research leadership. To fill the gap, we construct an institution-level weighted co-authorship network that integrates two types of weight on the edges: the intensity of collaborations and the spatial score (the geographical distance adjusted by the cross-linguistic-border nature). Based on this network, we propose a novel metric, namely the spatial research leadership rank, to identify the leading institutions while considering both the collaboration intensity and the spatial features. The leadership of an institution is measured by the following three criteria: (a) the institution frequently plays the corresponding rule in papers with other institutions; (b) the institution frequently plays the corresponding rule in longer distance and even cross-linguistic-border collaborations; (c) the participating institutions led by the institution have high leadership status themselves. Harnessing a dataset of 323,146 journal publications in pharmaceutical sciences during 2010-2018, we perform a comprehensive analysis of the geographical distribution and dynamic patterns of research leadership flows at the institution level. The results demonstrate that the SpatialLeaderRank outperforms baseline metrics in predicting the scholarly impact of institutions. And the result remains robust in the field of Information Science and Library Science.

show abstract

“…This suggests that PageRank's temporal bias can be removed by rescaling the scores with a transformation that ensures that the average score of the nodes and its standard deviation are independent of node age [40]. When such a transformation is applied, the resulting 'rescaled' score can detect much earlier important nodes, with useful implications for the early detection of milestone papers [40,41], patents [41,42], and movies [43]. The benefit from this procedure is exemplified, again, by the paper that reported the first direct observation of gravitational waves [38]: the paper is ranked 16th by rescaled PageRank at the end of 2016, which constitutes a substantial improvement compared to the 12 482nd position by the original PageRank, and suggests that the paper deserves a place among the most significant ones in the APS corpus.…”

Section: Biasmentioning

confidence: 99%

“…A key challenge is that the performance of an algorithm in one of these problems does not predict its performance in another one (cross-problem variability). For example, by comparing 17 network-based ranking algorithms, a recent study [41] found that time-rescaled versions of PageRank and its variant LeaderRank [11] are the best-performing algorithms in the identification of expert-selected seminal papers and patents. PageRank is also effective in identifying influential researchers [53].…”

Section: Performance Variabilitymentioning

confidence: 99%

“…In the problem of identifying expert-selected important nodes (papers or patents) in science and technology, the age distribution of the expert-selected nodes can significantly impact on the algorithms' performance: if the expert-selected nodes are old ones, performance evaluation metrics that ignore this bias will favor ranking algorithms that are biased in favor of old nodes [40,41,55]. 'Corrected' performance evaluation metrics that penalize biased metrics are not affected by this confounding effect [41]. However, there is not yet a unique and universally-agreed way to evaluate ranking algorithms for scientific and technological impact.…”

Section: Performance Variabilitymentioning

confidence: 99%

See 1 more Smart Citation

Network-based ranking in social systems: three challenges

Lü¹

2020

J. Phys. Complex.

Self Cite

View full text Add to dashboard Cite

Ranking algorithms are pervasive in our increasingly digitized societies, with important real-world applications including recommender systems, search engines, and influencer marketing practices. From a network science perspective, network-based ranking algorithms solve fundamental problems related to the identification of vital nodes for the stability and dynamics of a complex system. Despite the ubiquitous and successful applications of these algorithms, we argue that our understanding of their performance and their applications to real-world problems face three fundamental challenges: (1) rankings might be biased by various factors; (2) their effectiveness might be limited to specific problems; and (3) agents' decisions driven by rankings might result in potentially vicious feedback mechanisms and unhealthy systemic consequences. Methods rooted in network science and agent-based modeling can help us to understand and overcome these challenges.

show abstract

Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

Cited by 29 publications

References 57 publications

Limits of PageRank-based ranking methods in sports data

Limits of PageRank-based ranking methods in sports data

Characterizing research leadership on geographically weighted collaboration network

Network-based ranking in social systems: three challenges

Contact Info

Product

Resources

About