The big data revolution has had an impact on sports analytics as well. Many
large corporations have begun to see the financial benefits of integrating
sports analytics with big data. When we rely on central processing systems
to aggregate and analyze large amounts of sport data from many sources, we
compromise the accuracy and timeliness of the data. As a response to these
issues, distributed systems come to the rescue, and the MapReduce paradigm
holds promise for large scale data analytics. We describe a big data
architecture based on Docker containers with Apache Spark in this paper. We
evaluate the architecture on four data-intensive case studies in sport
analytics including structured analysis, streaming, machine learning
approaches, and graph-based analysis.