A computational literature review of football performance analysis through probabilistic topic modeling AbstractThis research aims to illustrate the potential use of concepts, techniques, and mining process tools to improve the systematic review process. Therefore, we performed a review on two online databases (Scopus and ISI Web of Science) from 2012 to 2019. We identified 9,649 studies that were analyzed by probabilistic topic modeling procedures in a machine learning approach. The Latent Dirichlet Allocation (LDA) method, chosen for modeling required the stages: 1) data cleansing, 2) data modeling into topics for coherence and perplexity analysis. All research was conducted according to the standards of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) in a fully computerized way. The computational literature review (CLR) is an integral part of a broader literature review process. The results presented met three criteria: (1) literature review for a research area, (2) analysis and classification of journals, and (3) analysis and classification of academic and individual research teams. A contribution of the article is to demonstrate how the publication's network formed in this particular field of research, and the content of the abstracts can be automatically analyzed to provide a set of research topics for quick understanding and application in future projects. Keywords Football; Performance Analysis; Literature review; Computational literature review; Topic models; LDA
IntroductionOver time, methods for conducting systematic reviews have become more rigorous, further prolonging the completion of reviews (Pham et al. 2018), due to finite resources concerning time and effort (Jennex 2015). Among this, a researcher, a doctoral student, or both, to better understanding a research area, needs to quickly get an overview of the literature associated with which journals have the most significant impact and what are the most recent and frequent topics (Mortenson and Vidgen 2016). Thus, researchers contribute to knowledge generation based on searches and promote education. For this, the use of text analysis is beneficial, given the significant increase in the number of electronic research materials in this new era (Lee et al. 2014).Brings scientists new challenges and opportunities due to the characteristics related to the volume, variety, speed of data creation (Chen, Zhong, and Yuan 2016). The systematic literature review (SLR) provides reliable means and established methods for carrying out a comprehensive and robust literature review (Felizardo et al. 2011). However, conducting this researches becomes quite costly due to the studies' growth of 8 to 9% each year, as reported by Bornmann and Mutz (2015). Besides, to being more significant than they used to be, bibliometric datasets are becoming more complex (McLevey and McIlroy-Young 2017).This abundant data requires computational skills to access these vast bibliometric data. Several programming languages used to make access more acces...