Read this post to understand how it works and how I built it! Enjoy.

How many times have you complained that your NBA team overpaid a player for some extraordinarily mediocre basketball? How many times have you wondered why your favorite role player is being underpaid while having the season of his career? Well, I built a model that certainly will not solve these burning questions!

What do I look like to you, some kind of GM mind reader? The best I can do is to show how much players SHOULD get paid based on their statistical performance. So how does this work? The model essentially takes per game stats (Points per game, etc.) for the season as an input and spits out a one season salary for your imaginary player. Each of the stats is weighted differently and a marginal increase or decrease on any one of the stats will change the salary by the weighted amount.

So how did I build this model?

First and foremost, I needed data that would correlate. Through significantly dedicated logic and acute reasoning, I determined that players who perform better statistically usually get paid more. Don’t ask me how I came to that conclusion, it took a lot of really advanced math.

So we assume that higher stats in positive categories will increase a player’s salary, and higher negative stats will decrease the salary.

Using this, I built the model with the following steps:

- Obtain per-game statistical data for each specific player. (Per-game is important because injured players will skew the data if you go by total season stats)
- Obtain salary data for each specific player
- Join the data so the set contains each player’s salary AND stat data
- Use machine learning to train the model using the main statistical categories as variable inputs (The stats I used were points, rbs, ast, stl, blk and FG% – all were per game for the season except FG%)
- I use a multilinear regression model for this. Essentially what our machine learning tool does is go through each data point that has both a salary and the player’s stats. It then trains itself, and writes an equation that takes each stat as a weight that contributes to the player’s overall salary.
- The equation looks something like this (this is a mock equation and not the one my model actually came up with): Salary(Y) = 12,432p + 29,302r + 36,900a + 8,900s + 13,201r + 1,023f
- Notice that each of the input variables is positive – this is because each of the stats I used are beneficial to the team. Had I included turnovers, the coefficient in all likelihood would have been negative, as we assume turnovers are an undesirable trait that decrease potential salary. Also, notice the coefficients. A higher coefficient means that specific stat has more weight when it comes to salary. In the above equation, the model says that rebounds and assists are worth the most money, as a marginal increase in either of these increases the salary by the most.

- That’s it! Now that we have an equation, all that’s left is to throw some inputs into our calculator and see what salary it gives us.

Unfortunately, our model is not perfect. There are a lot of factors outside of our equation that can explain changes in a player’s salary. So what limitations does this model have?

- In certain situations, a highly paid player will play injured, potentially lowering his per-game stats below what his salary would suggest.
- There is a cap on rookie and younger player contracts. This means that young, high performing players will skew the model because they are playing better than their contract would suggest.
- Some older players signed big contracts but are currently in decline. There is also such a thing as giving players contracts for ‘intangibles’ or a ‘locker-room presence’. These players are underperforming their salaries for the current year on a statistical basis.
- The difference between a ‘star’ and a ‘superstar’ is very high in terms of salary, but not very high in terms of game statistics. Take Damian Lillard and Steph Curry for example. Lillard is set to make $27 Million this year, while Curry is making a whopping $37 Million. But if you look at their stats, they are very comparable and it could be argued Lillard’s stats are better. So Curry makes OVER 33% MORE money than Lillard for performing the same on the court. It’s safe to say there are other reasons Curry makes that much money, such as his popularity and image, or that the Warriors have been championship contenders and don’t mind going over the salary cap to keep their all-star team intact.
- It works the other way as well. Players in the $0-8 Million range are very hard to predict. Like limitation #2 says, there is a cap on rookie contracts – so you could have a rookie going 20pts-6rbs-5ast and getting paid $3 million while there is some veteran out there making his veteran’s minimum of $5 Million and playing 10 minutes a game to post 7pts-2rbs-1ast numbers. So there you have it, that’s why our model works best on mid-tier players!

As you can see, there are a huge number of factors outside of just game stats that influence how much each individual NBA player is paid. The way I see it, my model isn’t wrong – NBA GM’s are just paying players the wrong amount of money based on their actual statistics. Thanks for reading! Don’t forget to try out the calculator.

]]>

In this post we’re going to attempt to determine whether or not fouls strongly affect a team’s wins or losses. In other words, do teams that foul more win MORE games, or rather, do more wins come to teams with fewer fouls over the course of the game?

To start, I had to choose a dataset that fit the purpose of my investigation. I chose to start with the 2014 NBA season as this was the season that marked the rise of the Golden State Warriors to prominence with them winning their first title since 1975. Our data ends with the last finished NBA season, 2017-18.

Why did I choose this year, you ask? I wanted to draw my correlations from the “modern” era of basketball, where 3-pointers reign and the game had drastically moved to pick-and-rolls, perimeter drives, and interchangeable positions, rather than post-ups, inside game, and 3-point specialists. In the future, however, I would love to do analysis on how the value of fouls have changed over time, so stay tuned.

**The Process (Trust it):**

Initially, I had a dataset that contained team stats from every single game played over the course of the 2014-2018 seasons. What I was looking for in the end was something that compared the total number of fouls of a team over those years to the total number of wins.

STEPS:

- Group the data by team
- Take the total team fouls per game and sum them to find the total number of team fouls over the chosen time period
- W-L was in the dataset as a string that contained “W” or “L” so perform a count of the “W” string to determine the total number of wins for a particular team
- Take our total wins and total fouls and plot them against each other

So here you go, let’s take a look at what we got!

As you can see, the points are fairly scattered, however there is a slight downward trend, which is actually what I expected to see. I figured teams with more wins would stray toward fouling less, but the correlation would not be strong and here’s why:

-Starting base level – in most games, yes it is better to foul as little as possible. Fouling in a basketball game not only gives your opponent free points, but also creates situations on defense in which your team cannot be as aggressive later in the game. This makes it harder to play defense and makes you lose more.

*Why isn’t it steeper?*

-We have to remember that we are correlating WINS with fouls, rather than DEFENSE with fouls. Essentially, we have eliminated half of the input that goes into a win or a loss.

-If you look at the outliers, the teams that win more AND foul more have terrific offenses. If you look at the graph, two teams that stand out are the Warriors and Rockets – both of them lie just above the trend line for fouling yet have the 1st and 3rd most wins over our timespan.

-The teams that win more and foul LESS, for example the Spurs and Cavaliers – are more defensive minded teams. They lie below the trend line for fouling but have the 2nd and 5th most wins over our timespan.

-If these teams all had the same offensive efficiency, I would be willing to bet that they would sit much more firmly on that trend line.

Conclusion: It is hard to tell if fouling more or less will definitively garner a team more wins or losses simply because we are omitting the offensive side of the ball. My assumption is that defensive efficiency increases when teams foul less, thus resulting in more wins – however there are still teams out there that get ‘carried’ by their offense, or dragged down by it in the W-L column. The next step is to determine whether defensive efficiency is strongly affected by total fouls, and then how much wins and losses are affected by defensive efficiency. This, and more to come in the next post!

]]>If you got that Fox Sports West reference then ultimate respect to you…If you know the name of the sportscaster who started his show with that line…you are a true southern California sports fan. But anyway, you’re here for data so let’s get started on the good stuff.

I decided to do this post as my inaugural post for a couple of reasons:

A) I am interested in the findings.

B) I had a hypothesis on how the data would turn out and wanted my first study to look like I actually found something.

**The Low Down:**

Basically what we are looking at here is the progression of a team’s field goal percentage from month to month as the season goes on. We’re trying to see at what point in the season teams shoot the best.

**Hypothesis:**

My initial hypothesis was that as the NBA season went on, teams would shoot better, making the graph a simple linear regression trending up. This was based on the reasoning that players and teams are rusty, still getting used to coaching systems, and simply not in game shape at the beginning of the year. As the season progresses, however, they start to play at a higher level – leading to a steady increase in shooting percentages.

However after a little bit of thinking, I revised my hypothesis to say the FG% would look more like a bell curve, with percentages highest in the middle of the season and lower at the beginning and tail ends.

Here’s my reasoning:

*Start of the Season: *Players are still getting their game legs and adjusting to coaching/systems, causing them to shoot a lower percentage (Same as the initial hypothesis).

*Middle of the season: *Players are in game shape, used to their systems and coaches, and the games don’t matter as much at this point – all things conducive to a higher percentage of shots going in. I call this portion of the season the ‘cruising’ portion.

*End of the season: *At this point of the season, the games matter much more as teams fight to get into the playoffs. Defensive intensity is ramped up which makes it harder for the average team/player to shoot a high percentage.

**Steps:**

- Obtain a dataset with season long data – specifically we are looking at team field goal percentages and game dates
- Group the games by month (I tried analyzing on a game by game basis, but there was way too much game to game fluctuation for any real analysis)
- Once grouped by month, find the average field goal percentage shot each month

**Data Vizualization:**

This graph shows the field goal percentage progression from the 2017-18 NBA season. As we can see, the data supports my hypothesis! FG% is lower to start the season, higher in the middle, then drops off again toward the end.

**Notes:**

- There is a much larger jump from October to November than the drop from February to April.
- This may go to show that players getting used to their systems and getting back into shape has a bigger impact on FG% than the ramped up defense at the end of the year.

- October is a shortened month because the season starts in October, so the small sample size may be a factor in the large jump.
- Last year, players shot the best in February and the worst in October.

**Conclusion:**

For now, it looks as though the data supports my hypothesis that FG% is a bell curve as the NBA season goes on. Of course, I’ve only pulled data from one season and to get a better picture we’d have to perform this study again over multiple seasons.

Thanks for reading! If you have any data related questions you would like answered, let me know and I’ll see what I can do!

]]>

It’s all coming right here.

]]>