Most Predictable Sport
Posted : admin On 4/1/2022Let me get this off my chest, I know nothing about sports. So if you are reading for insight into quantifying the value of a running back compared with a linebacker, this is not the article for you. Instead, I will develop a predictive model that can be applied to almost any sport, and use it to show you that the NFL is in fact the most predictable. Then I will run 50 simulations of the NFL playoffs for 2017 to provide Super Bowl probabilities for each team, again reinforcing the model’s accuracy and the general predictability of the NFL. Finally, I will provide predictions and probabilities for the teams competing in the NFL playoffs of 2018.
2 days ago Sport. Leeds United FC. Pundits share predictable concern about 'box office' Leeds United ahead of West Ham away game. Leeds are looking to bounce back from their defeat by Aston Villa when they take on the Hammers on Monday night. That said, there’s still sports cars that are better suited for the long haul than others. To find out which are the most reliable cars in this class, we analyzed over 13.8 million used cars sold in 2019 from model years 1981 through 2018. For each model, the percentage of the number of cars sold with at least 150,000 miles was calculated. We can see that the NFL is clearly the most predictable arriving at a 78% predictive accuracy when 2 mutual teams are used. You will also notice that the MLB is the least predictable, doing little if at all better than just consistently predicting the home team. Football and basketball are the most predictable. The worst teams practically never beat the best teams. In many years there's a college football team that goes undefeated, and the best NFL team is usually 14-2 or something like that. The best college basketball team usually wins around 90% of its games, and the best NBA team 80%.
To begin, lets define a sport as predictable if you can get reasonable predictions for a games winner with only knowledge of prior game scores. Then we call a sport predictable if, with no knowledge of the inner workings of the sport itself, you can arrive at accurate predictions. With that said, it is possible that under a different framework a different sport can be labeled as most predictable (for example, possessing intimate knowledge of baseball and basketball may lead you to the conclusion the NBA or MLB is most predictable).
The model to predict game outcomes is relatively simple. Imagine if, like me, you possess very little knowledge about sports. Given two competing teams, how could you predict the winner? If they have played each other previously, one answer is to select the previous games winner. But what if they have not played each other? A natural extension of this would be to consider mutual teams. A mutual team is a team that has already played both teams you are currently considering. For example, if you are interested in predicting the outcome of the Celtics versus the Raptors, observe the Celtics performance against the Knicks compared with the Raptors performance against the Knicks. If both teams beat the Knicks, which team won by a larger magnitude, if they both lost which team lost by a smaller magnitude. In either case the answer could provide you which team to predict as the winner. In this example we would refer to the Knicks as a mutual team. We could consider two mutual teams, where we have another intermediate team between the Knicks and the Raptors. Conceptually we can do this because if a team’s performance can be treated as a constant, then comparing their performance in two separate games should be relatively the same. The reason we may want to do this is because every additional mutual team may provide additional information about the home and away teams relative performance (we will see if this is true shortly). Continuing in this process we could have an even larger number of mutual teams. This process of ‘comparing’ teams is really comparing the differences of the two teams scores from a specific game in which these teams competed. A more formal expression of this statistic can be found in the appendix [1]
This process of finding a link of games that connects the home and away team creates a simple summary statistic associated with every game to predict whether the home team will win or lose [2]. Now I must provide a caveat, there are numerous important variables, such as if a key player was injured, that are not included within my model (recall our objective is to determine the comparative predictability of different sports when provided minimal information). Below display’s the predictive results on unseen data (predicting last 76 games for each sport) when this statistic is applied to the NFL, MLB, and NBA with a varying number of mutual teams [3].
We can see that the NFL is clearly the most predictable arriving at a 78% predictive accuracy when 2 mutual teams are used. You will also notice that the MLB is the least predictable, doing little if at all better than just consistently predicting the home team. This may be due to the frequent turnover of players from game to game, however as aforementioned I cannot fruitfully speculate on such matters. In the other cases, as the number of mutual teams increase we see predictive accuracy fall. This would suggest that as we increase the number of intermediary teams we dilute the information most relevant, that being how well the home team and away team compare with one another.
Using the NFL score data of season 2016-2017, we can then apply the probabilities derived from this statistic to simulate the results of last year’s playoffs. We use the modeled probabilities from the statistic applied to 2 mutual teams, which has the highest historical accuracy. To do this I did need to discover some NFL specific information regarding which teams are in the AFC vs. NFC and their respective seeds which decides the competition structure of the playoff brackets. While the winners and losers of games were decided based on probabilities from the model, the specific scores were provided based on a normal distribution of previous games scores. In this way I simulate the playoffs 50 times. Below display’s the results of these simulations.
Super Bowl participation shows the number of times that team made it to the Super Bowl and Super Bowl wins shows the number of times that team won the Super Bowl. We can see that the Patriots were the favorite, winning 19 Super Bowls and competing in the Super Bowl 29 of the potential 50 simulations. This indicates they possessed approximately a 38% chance of winning the Super Bowl at the start of the playoffs and about a 58% chance of participating in the Super Bowl. We find the Falcons to be the second strongest team, participating in 20 Super Bowls and winning 12 of them, indicating they possessed around a 24% chance of winning the Super Bowl and a 40% chance of participating in the Super Bowl. As we now know, these two teams were also the same teams to compete in the Super Bowl of 2017.
Now that we have observed the predictive accuracy of the model individually and back tested it by simulating historical data, we are prepared to make predictions for this year. Below are the simulation results associated with the Super Bowl of 2018 with the model retrained on week’s 1-17 of this current season.
Again we see the Patriots as the clear favorite, winning 12 Super Bowls of the potential 50 simulations, indicating that they have approximately a 24% chance of winning the Super Bowl, which is fairly low compared with last year. The Eagles are about equally likely to participate in the Super Bowl as the Saints, around 26% to 24%. The Saints appear to be the second most likely to win the Super Bowl with about an 18% probability. So if you want to predict someone other than the Patriots to win the Super Bowl, it’s the Saints that are your best bet.
If you are interested, all the code and data sets for this article are provided on my github. Please feel free to leave any questions or thoughts within the comments.
[2] I am running simple logistic regression where the response is whether the home team won, where the only predictor is the statistic discussed in [1].
[3] 70%-30% train test split, where the train came from earlier data (ex. Week 1- 10) and the test came from later data (ex. Week 11-17). The data was subsetted to contain only the first 256 observations (this is the total number of NFL games which is the smallest dataset of the three). Since each sport has the same number of teams and games, it is safe to assume that this quantity of data should provide representative predictive accuracy for the model throughout the rest of the season.
What Is The Most Predictable Sport
[4] This model cares only about the magnitudes of differences of scores, not the scores themselves.