The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this article, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players' evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by PlayeRank and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank-i.e. searching players and player versatility-showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics. 59:2 L. Pappalardo et al.
INTRODUCTIONRankings of soccer players and data-driven evaluations of their performance are becoming more central in the soccer industry [4,14,15,20,32,34,40,46]. On the one hand, many sports companies, websites, and television broadcasters-such as Opta, WhoScored.com, and Sky, as well as the plethora of online platforms for fantasy football and e-sports-widely use soccer statistics to compare the performance of professional players with the purpose of increasing fan engagement via critical analyses, insights, and scoring patterns. On the other hand, coaches and team managers are interested in analytic tools to support tactical analysis and monitor the quality of their players during individual matches or entire seasons [11,27]. Not least, soccer scouts and performance analysts are continuously looking for data-driven tools to improve the retrieval of talented players with desired characteristics, based on evaluation criteria that take into account the complexity and the multi-dimensional nature of soccer performance. While selecting talents on the entire space of soccer players is unfeasible (if not impossible!) for humans, data-driven performance scores help select a small subset of the best players who meet specific constraints (e.g., age, performance features and trends, roles). This allows scouts and performance analysts to analyze a smaller set of players, thus saving considerable time and economic resources while broadening scouting operations and career opportunities of talented players.The problem of data-driven evaluation of player performance is gaining interest in the scientific community, too, thanks to t...