UPDATE: I’ve posted a link to Excel files with spreadsheets I used in this analysis at the bottom of this page. There’s a bonus file, which has the Pomeroy rankings of each ACC team. If you’re a mathematical type of person and think that what I’ve done here is unsophisticated (at best), please see if you can do something better. Post it in the comments and I’ll give it some love in a future blog post.
I’ll warn you right now: This might get a little math-y.
Today, I wrote a column in which I took a statistical look at two commonly-held and related Duke basketball precepts:
1. Duke struggles down the stretch
2. These struggles are caused in part by Coach K’s habit of playing his top players too many minutes
I thought I’d use this space to fill in some of the math-heavy details that I glossed over in the paper-and-ink version of the column, and to show some of the graphs I made reference to in the column without printing.
I started my analysis with a scatterplot. Each ACC game since 2004 was assigned a point on the scatterplot, where the x value was ACC game (i.e. 1 is the conference opener, 19 is the ACC championship game) and the y value was efficiency margin:
Efficiency margin = (Points per possession) – (points allowed per possession)
(For a full explanation of tempo-free, per-possession statistics, you can’t do better than this website, which is also where I got all of my data for the scatterplots. The CliffNotes version is as follows: Tempo-free stats allow teams to be compared without regard for style of play; if a team scores more points per possession than its opponents, it will win. The larger the per possession difference, the larger the margin of victory relative to the pace at which the game is played. Efficiency margin is a worthwhile stat for comparing teams because, all else being equal, a team will tend to increase its margin of victory as long as it is possible to do so. Thus, a “very good” team will beat a bad team by more points than a mere “good” team will.)
I ran linear regression models on all points collectively and on each season individually. Predictably, most seasons failed to produce robust trends due to the limited sample size, though all seasons from 2004-2009 demonstrated a downward trend. However, when I ran the linear regression model on all points simultaneously, there was a clearer downward trend. The p-value for negative slope was 0.0002, highly statistically significant, and the R-squared value was 0.12.
The R-squared value (along with the highly significant p-value) intrigued me. It’s not a very high number, but it essentially indicates that 12% of the variation in Duke’s play is determined by when the game is played. One would expect this value would be somewhere around zero.
To make sure that this result was not just a function of increasingly competitive games as the season wore on (perhaps as the post-season approaches, teams tried harder?), I extended my analysis to cover two additional teams – UNC (for obvious reasons) and Michigan State (because the Spartans have a reputation for getting better as the season goes on). As you can see, both teams had essentially flat slopes for each season as well as for the six seasons collectively.
So I was reasonably convinced that Duke’s performance does decline over the course of the season. Proving that this decline was related or unrelated to the number of minutes played by Duke’s star players proved to be more difficult.
I made another scatterplot where each season got one point: The y-coordinate was percentage of minutes played by Duke’s starters, and the x-coordinate was the slope of Duke’s decline. This graph was somewhat intractable to linear regression; however, the Blue Devils best season (where their decline over the course of ACC play was least) coincided with the seasons in which Duke’s starters played the most minutes. The r-square value here was 0.075 – not that impressive.
Since the precept dealt more with playing guys too many minutes and less with playing too few guys, I then looked at seasons in which Duke used several players heavily. Since 2004, there have been three seasons in which three players each played more than 32 minutes per game and three seasons where this has not been the case. In the three seasons where three players played more than 32 minutes per game, the slope of the decline was less step. In the three seasons where one player averaged over 35 minutes per game, the slope of the decline was less than in the three other seasons. Of course, the sample sizes here are exceedingly small, and I don’t think its really worth drawing any conclusions from these data.
So basically, I’ve proven to myself (and hopefully you) that the late-season decline is real, but I can’t demonstrate any causes.
Your guess is as good as mine.
Click here for an Excel file of the data I used in this analysis
Click here for an Excel file of Duke’s opponent’s Pomeroy rankings (EDITED)





Did you try to use a different regression model than fitting a straight line. A scatterplot smoother such as ‘lowess’ could provide further insights. Would be interesting to see the trend.
Alternatively, you can send me the data and I can have a look.
“To make sure that this result was not just a function of increasingly competitive games as the season wore on (perhaps as the post-season approaches, teams tried harder?)”
I think you could look into this theory more rather than just looking at two other teams.
It would be interesting to measure opponent’s RPI ranking vs. time of year. The way there schedule is set up each year, I am sure you would see an upward slope.
Also, you may want to run the data without the 2 UNC games for each year. UNC was the toughest competition over these years and since they average out to later in the season they would seemingly skew the data to a downward slope.
Interesting article, thanks
I’ve updated this blog post with links to the spreadsheets I used for this analysis. If you have any ideas for a more robust statistical analysis or you just want to check my work, have at it.
This is great. Fantastic job with the article and following up with this.
Alex,
Great article! I love tempo-free statistics and what they can tell us about the game. I tried to address some of the concerns about home/away effects, opponents getting tougher as the season progresses, and year-by-year effects, in a blog post of my own:
http://www.immaculateinning.com/2010/02/duke-fade-response.html
Looking forward to what you think of this!
Pingback: The Double Bonus: The Coaching Conundrum « No Pun Intended
Have you tried your analysis ignoring ACC tournament games? It seems that competition in the tournament is more intense than in the regular season, which would more likely result in closer games, so I would be curious what R squared is for only the regular season. Also, are you reporting R squared or adjusted R squared? (adj. is normally preferred).
I created plots for each ACC season of Duke since 2004 using the loess smoother instead of the linear line fit. Could you let me know how I can send them to you via email (in case you are interested)? Thanks.
S,
I’m reporting r-squared, because I honestly didn’t know what adjusted r-squared was until you mentioned it. (I’m a med student, not a statistician.) As far as removing ACC Tournament games from the analysis, doing so actually increases the r-square to 0.16.
Time to re-do this analysis!