Sports big data has been an emerging research area in recent years. The purpose of this study was to ascertain the most frequent research topics, application areas, data sources, and data usage characteristics in the existing literature, in order to understand the development of data-driven baseball research and the multidisciplinary participation in the big data era. A scoping review was conducted, focusing on the diversity of using publicly available major league baseball data. Next, the co-occurrence analysis in bibliometrics was used to present a knowledge map of the reviewed literature. Finally, we propose a comprehensive baseball data research domain framework to visualize the ecosystem of publicly available sports data applications mapped to the four application domains in the big data maturity model. After searching and screening process from the Web of Science, Science Direct, and SPORTDiscus database, 48 relevant papers with clearly indicated data sources and data fields used were finally selected and full reviewed for advanced analysis. The most relevant research hotspots for sports data are sequentially economics and finance, sports injury, and sports performance evaluation. Subjects studied ranged from pitchers, position players, catchers, umpires, batters, free agents, and attendees. The most popular data sources are PITCHf/x, the Lahman Baseball Database, and baseball-reference.com. This review can serve as a valuable starting point for researchers to plan research strategies, to discover opportunities for cross-disciplinary research innovations, and to categorize their work in the context of the state of research.