This paper aims to develop an interpretable machine learning model to predict plays (pass versus rush) in the National Football League that will be useful for players and coaches in real time. Using data from the 2013-2014 to 2016-2017 NFL regular seasons, which included 1034 games and 130,344 pass/rush plays, we first develop and compare several machine learning models to determine the maximum possible prediction accuracy. The best performing model, a neural network, achieves a prediction accuracy of 75.3%, which is competitive with the state-of-the-art methods applied to other datasets. Then, we search over a family of simple decision tree models to identify one that captures 86% of the prediction accuracy of the neural network yet can be easily memorized and implemented in an actual game. We extend the analysis to building decision tree models tailored for each of the 32 NFL teams, obtaining accuracies ranging from 64.7% to 82.5%. Overall, our decision tree models can be a useful tool for coaches and players to improve their chances of stopping an offensive play.
Value functions are used in sports to determine the optimal action players should employ. However, most literature implicitly assumes that players can perform the prescribed action with known and fixed probability of success. The effect of varying this probability or, equivalently, “execution error” in implementing an action (e.g., hitting a tennis ball to a specific location on the court) on the design of optimal strategies, has received limited attention. In this paper, we develop a novel modeling framework based on Markov reward processes and Markov decision processes to investigate how execution error impacts a player’s value function and strategy in tennis. We power our models with hundreds of millions of simulated tennis shots with 3D ball and 2D player tracking data. We find that optimal shot selection strategies in tennis become more conservative as execution error grows, and that having perfect execution with the empirical shot selection strategy is roughly equivalent to choosing one or two optimal shots with average execution error. We find that execution error on backhand shots is more costly than on forehand shots, and that optimal shot selection on a serve return is more valuable than on any other shot, over all values of execution error.
In this paper, we describe a course project in which teams of undergraduate students propose and execute an end-to-end analytics project to solve a real-world problem. The project challenges students to implement machine learning, optimization, simulation, or a combination of these three techniques on real-world data that they collect. A designated project advisor helps each team refine its project and assesses the quality of the resulting work. In our analysis of 58 past projects, we show that students developed solutions for a wide range of topics by employing various methodologies. However, most teams encountered similar challenges that project advisors helped them overcome with tailored feedback. Based on feedback from 106 previous students, the project experience was largely positive and helped them prepare for their future careers. We believe that this type of hands-on project is conducive to the development of important data analytics skills.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.