Machine learning model could better measure baseball players' performance
In the movie “Moneyball,” a young economics graduate and a cash-strapped Major League Baseball coach introduce a new way to evaluate baseball players’ value. Their innovative idea to compute players’ statistical data and salaries enabled the Oakland A’s to recruit quality talent overlooked by other teams — completely revitalizing the team without exceeding budget.
New research at the Penn State College of Information Sciences and Technology could make a similar impact on the sport. The team has developed a machine learning model that could better measure baseball players’ and teams’ short- and long-term performance, compared to existing statistical analysis methods for the sport. Drawing on recent advances in natural language processing and computer vision, their approach would completely change, and could enhance, the way the state of a game and a player’s impact on the game is measured.
According to Connor Heaton, doctoral candidate in the College of IST, the existing family of methods, known as sabermetrics, rely upon the number of times a player or team achieves a discrete event — such as hitting a double or home run. However, it doesn’t consider the surrounding context of each action.
“Think about a scenario in which a player recorded a single in his last plate appearance,” said Heaton. “He could have hit a dribbler down the third base line, advancing a runner from first to second and beat the throw to first, or hit a ball to deep left field and reached first base comfortably but didn’t have the speed to push for a double. Describing both situations as resulting in ‘a single’ is accurate but does not tell the whole story.”
Heaton’s model instead learns the meaning of in-game events based on the impact they have on the game and the context in which they occur, then outputs numerical representations of how players impact the game by viewing the game as a sequence of events.
“We often talk about baseball in terms of ‘this player had two singles and a double yesterday,’ or ‘he went one for four,” said Heaton. “A lot of the ways in which we talk about the game just summarize the events with one summary statistic. Our work is trying to take a more holistic picture of the game and to get a more nuanced, computational description of how players impact the game.”
In Heaton’s novel method, he leverages sequential modeling techniques used in natural language processing to help computers learn the role or meaning of different words. He applied that approach to teach his model the role or meaning of different events in a baseball game — for example, when a batter hits a single. Then, he modeled the game as a sequence of events to offer new insight on existing statistics. More