America has been glued to their TV screens since the MLB playoffs began on October 1. As the field has whittled down to just four teams, odds makers are eager to figure out which team has the edge.
Researchers at DataRobot thought it would be a fun exercise to pull all of the MLB data from the last few decades and have their AI figure out who will win the 2019 World Series.
SEE: Artificial intelligence: A business leader’s guide (free PDF) (TechRepublic Premium)
At the start of the playoffs, the AI predicted the Los Angeles Dodgers were most likely to win the pennant, followed closely by the Houston Astros. In the American League, DataRobot’s AI said the Houston Astros had a 40% probability of winning the American League, followed by the New York Yankees at 25% and Minnesota Twins at 18%.
For the National League, DataRobot gave the Los Angeles Dodgers a 47% chance of winning the title, far above the Atlanta Braves at 23% and the 13% chance given to both the Saint Louis Cardinals and Washington Nationals.
Most analysts had similar predictions, but as always few could predict the unexplainable nature of baseball’s playoff atmosphere. The Los Angeles Dodgers were shockingly knocked out of the playoffs by the Nationals and the Atlanta Braves were soundly beaten by the Cardinals, propelling both teams into the National League Championship.
DataRobot had better luck with the American League, where they correctly predicted that the Astros and Yankees would face off in the championship series.
The brains behind the AI prediction, DataRobot general manager for sports and gaming Andrew Engel, told TechRepublic the prediction was a natural extension of the work they already do with major league sports teams.
“Data has proven to be very influential and valued throughout the sports world as seen with the US Open, Wimbledon, and March Madness. But, who’s to really know how the playoffs will go?” Engel said. “The models we’ve built through DataRobot give us a better idea of how teams will fare this season, but there’s still plenty of room for surprises. This is what makes the world of sports analytics so exciting.”
For this project, Engel ran more than 20 years of MLB data through DataRobot to see, which team would be most likely to win the World Series.
The company’s AI was able to figure out each playoff team’s Elo rating and built models to predict, which teams would win each game. Statisticians use Elo ratings to determine the skill level of players within a system. The statistic was originally created for chess but quickly became useful for baseball.
In addition to Elo, Engel also used widely used baseball statistics like OPS+, WAR (by position), RAA, and others.
“DataRobot is an automated machine learning platform. So I can feed all of those historical games into DataRobot and churn through 70 different models and try to find the one that does the best job of predicting who can win a given game,” he said. “When you have that, you can create a model.”
Once Engel and his team created the kind of model that they thought produced accurate results, they played the simulation 100,000 times and tabulated how often each team won the World Series.
DataRobot already does a lot of work with individual teams and sports leagues like the MLB, NBA, and NHL, so this was a natural extension of what its AI usually tackles. The company helps teams in a variety of ways, ranging from predictions on young players to deeper understandings of fan behavior.
“DataRobot is a company that enables other companies to harness the power of AI by providing an enterprise AI software platform and services as a strategic advisor to drive AI adoption across business,” Engel said. “Part of why I made this is because I’m a sports fan and huge baseball fan. Historically, baseball is incredibly analytically driven. The use of data in the sport has been heavy since the beginning, so it was only fitting to use it now that we have tools to ingest lots of data.”
As with most predictions, it’s not perfect. Engel said the prediction can’t take into account things like home field advantage, pitching lineups, injuries, and star performances. After each round, he plans to redo the simulation to see who comes out on top.
And he will definitely need to redo his prediction considering the upsets that occurred this week.