NBA Stats to Salary Predictor

This article is about a salary predicting model I just built:

Read this post to understand how it works and how I built it! Enjoy.

How many times have you complained that your NBA team overpaid a player for some extraordinarily mediocre basketball? How many times have you wondered why your favorite role player is being underpaid while having the season of his career? Well, I built a model that certainly will not solve these burning questions!

What do I look like to you, some kind of GM mind reader? The best I can do is to show how much players SHOULD get paid based on their statistical performance. So how does this work? The model essentially takes per game stats (Points per game, etc.) for the season as an input and spits out a one season salary for your imaginary player. Each of the stats is weighted differently and a marginal increase or decrease on any one of the stats will change the salary by the weighted amount.

So how did I build this model?

First and foremost, I needed data that would correlate. Through significantly dedicated logic and acute reasoning, I determined that players who perform better statistically usually get paid more. Don’t ask me how I came to that conclusion, it took a lot of really advanced math.

So we assume that higher stats in positive categories will increase a player’s salary, and higher negative stats will decrease the salary.

Using this, I built the model with the following steps:

  1. Obtain per-game statistical data for each specific player. (Per-game is important because injured players will skew the data if you go by total season stats)
  2. Obtain salary data for each specific player
  3. Join the data so the set contains each player’s salary AND stat data
  4. Use machine learning to train the model using the main statistical categories as variable inputs (The stats I used were points, rbs, ast, stl, blk and FG% – all were per game for the season except FG%)
    1. I use a multilinear regression model for this. Essentially what our machine learning tool does is go through each data point that has both a salary and the player’s stats. It then trains itself, and writes an equation that takes each stat as a weight that contributes to the player’s overall salary.
    2. The equation looks something like this (this is a mock equation and not the one my model actually came up with): Salary(Y) = 12,432p + 29,302r + 36,900a + 8,900s + 13,201r + 1,023f
    3. Notice that each of the input variables is positive – this is because each of the stats I used are beneficial to the team. Had I included turnovers, the coefficient in all likelihood would have been negative, as we assume turnovers are an undesirable trait that decrease potential salary. Also, notice the coefficients. A higher coefficient means that specific stat has more weight when it comes to salary. In the above equation, the model says that rebounds and assists are worth the most money, as a marginal increase in either of these increases the salary by the most.
  5. That’s it! Now that we have an equation, all that’s left is to throw some inputs into our calculator and see what salary it gives us.

Unfortunately, our model is not perfect. There are a lot of factors outside of our equation that can explain changes in a player’s salary. So what limitations does this model have?

  1. In certain situations, a highly paid player will play injured, potentially lowering his per-game stats below what his salary would suggest.
  2. There is a cap on rookie and younger player contracts. This means that young, high performing players will skew the model because they are playing better than their contract would suggest.
  3. Some older players signed big contracts but are currently in decline. There is also such a thing as giving players contracts for ‘intangibles’ or a ‘locker-room presence’. These players are underperforming their salaries for the current year on a statistical basis.
  4. The difference between a ‘star’ and a ‘superstar’ is very high in terms of salary, but not very high in terms of game statistics. Take Damian Lillard and Steph Curry for example. Lillard is set to make $27 Million this year, while Curry is making a whopping $37 Million. But if you look at their stats, they are very comparable and it could be argued Lillard’s stats are better. So Curry makes OVER 33% MORE money than Lillard for performing the same on the court. It’s safe to say there are other reasons Curry makes that much money, such as his popularity and image, or that the Warriors have been championship contenders and don’t mind going over the salary cap to keep their all-star team intact.
  5. It works the other way as well. Players in the $0-8 Million range are very hard to predict. Like limitation #2 says, there is a cap on rookie contracts – so you could have a rookie going 20pts-6rbs-5ast and getting paid $3 million while there is some veteran out there making his veteran’s minimum of $5 Million and playing 10 minutes a game to post 7pts-2rbs-1ast numbers. So there you have it, that’s why our model works best on mid-tier players!

As you can see, there are a huge number of factors outside of just game stats that influence how much each individual NBA player is paid. The way I see it, my model isn’t wrong – NBA GM’s are just paying players the wrong amount of money based on their actual statistics. Thanks for reading! Don’t forget to try out the calculator.