Regressing Shooting Percentages To Find True Talent
I think it's only fair to say that there will be some math geekery in here, but I invite all readers in, as you can look beyond any of the math and see the results after the "so what does this all mean?" header.
One of the things that interests me in sports are team shooting percentages. On an individual level, we often see a players shooting percentage fluctuate from year to year and the usual narrative is that he is distracted by potential contract situations or he hasn't been 'feeling it' recently or he didn't focus enough on it or the coach hasn't been getting Player X the puck/ball/whatever enough in good positions -- whatever becomes the easiest thing for journalists to churn out to meet their deadlines. This is true, too, at the team level.
For instance, in the NHL in 2009-2010, the Colorado Avalanche had an unexpected birth into the postseason when it was expected to be another run-of-the-mill rebuilding season. In fact, in 2008-09, Colorado finished 28th in the league with 69 total points. The next year, however, they had a season of good of netminding even though they were heavily out shot and finished 13 games over .500 and had 95 points. Those who knew what to look for -- particularly unsustainable shooting percentages for or against while not controlling the play -- saw a major step back for Colorado in 2010-2011.
How did Colorado finish this year after having a year of unsustainable save percentage two years ago? Well, they were 14 games under .500 and finished second-to-last with 68 points.
Now, why do I bring up a hockey team when I'm writing a college lacrosse FanPost? Well, because both share a common statistic: Shooting percentages.
No pro sports season is long enough to take a single season at face value, as numerous things creep into the data -- luck, bias, good fortune, random variation; take your favorite phrase and that is what I'm talking about. This makes looking at raw, single-season shooting percentages unreliable. This is a short-but-sweet post on what I mean when I say "luck."
What I've done is followed this post by Tom Tango, noted baseball Sabermetrician and current consultant to at least two Major League Baseball teams, to estimate the amount of true talent there is in a given sport. His example was with individual free throw shooting in the NBA. Mine is team-level shooting and opponent shooting percentages*.
*For this purpose, shooting percentages are Goals divided by Shots Attempted, not Shots On Goal.
Even in Tom's article, he found that you should still regress someone like Steve Nash even after over 3000 career FT attempts. Granted, it's less-than-slim difference dropping him from (at the time) a 90.4% FT shooter to 90.3%.
In the college lacrosse season where teams are playing between 12 and 18 games, we can be relatively certain that the majority of what we see in shooting percentages -- both for and against -- are the team's true talent. However, there's about 25% usually that is one of the luck, good fortune, random variation monsters I've mentioned before.
The 50% regression point for my data set which is 2009 through 2011 seasons -- so it's still small; 178 team seasons worth -- is 234 shot attempts on offense. What this means is once your favorite team has taken around 235 shots in a season -- the average team the last three years fired 34.2 shots towards the cage per game, which means you 235 shots is about 6.5 games worth -- what you've seen is likely 50% skill and 50% random variation/luck/good fortune.
What you do is then take the team's shooting percentage and the average and merge the two. Say your team was shooting a blistering 40% through their first 235 shots and the average D1 team shoots 28%. You'd do this:
(0.4*0.5)+((0.28*0.5)+235))
On the left, we have the team's shooting percentage, on the right is the D1 average and then the final number is the number of shots taken. This regresses your favorite team towards a "true talent" of 34% shooting percentage. That is essentially a difference of around 14 goals through those 235 shots.
The amount you regress to is based on the number of shots the team takes. So the more you fire towards the net, the less you'll be regressed towards the mean. In my data set, the Virginia Cavaliers fired the most shots at the net with 790 shots in 2009 with North Carolina second at 789. Even with almost 800 shots worth of a sample, we still need to regress their offensive shooting percentage 22% towards the national average. Essentially, four-fifths of what we saw was their talent and one-fifth was some random variation/luck/good fortune.
The fewest shots taken in my data set was by Presbyterian this past year with only 269 shots fired at the net. They get regressed 46% towards the national mean.
So what does it all mean?
Well, with the proper regression in place, we can now get new shooting percentages and regressed goals for and allowed (and a differential). Let's take a look at some results:
The G and SH% are actual goals and actual offensive shooting percentages. rSH% is the teams regressed shooting percentage and, as a result, the reg GS is number of goals we'd "expect" them to score. The GSD is the difference between regressed goals scored and actual goals scored. Let's take a look at the "Over Performers."
| Team | CONF | Yr | SHA | G | SH% | rSH% | reg GS | GSD |
|---|---|---|---|---|---|---|---|---|
| SB | AE | 2010 | 580 | 225 | 0.388 | 0.358 | 208 | 17 |
| RMU | IND | 2010 | 647 | 230 | 0.355 | 0.336 | 218 | 12 |
| SB | AE | 2009 | 464 | 168 | 0.362 | 0.336 | 156 | 12 |
| DEN | ECAC | 2010 | 526 | 188 | 0.357 | 0.335 | 176 | 12 |
| RMU | CAA | 2009 | 553 | 195 | 0.353 | 0.332 | 184 | 11 |
| DUKE | ACC | 2010 | 780 | 269 | 0.345 | 0.331 | 258 | 11 |
| HOF | CAA | 2010 | 514 | 181 | 0.352 | 0.330 | 170 | 11 |
| RMU | NEC | 2011 | 604 | 207 | 0.343 | 0.326 | 197 | 10 |
| SB | AE | 2011 | 500 | 172 | 0.344 | 0.324 | 162 | 10 |
| SIENA | MAAC | 2011 | 608 | 206 | 0.339 | 0.323 | 197 | 9 |
These teams are those that outscored what they "should've" over the last three years. Stony Brook in particular populates the list. Their last three seasons all show up in the top ten and I'd guess that their shooting percentage consistently out performing their regressed shooting percentage is due to their conference and schedule -- but I only have scheduling data for the 2011 season thus far, so I can't say for certain, except that my assumption holds true for the 2011 campaign.
And the top ten "Under Performers" offensively:
| Team | CONF | Yr | SHA | G | SH% | rSH% | reg GS | GSD |
|---|---|---|---|---|---|---|---|---|
| AF | ECAC | 2010 | 453 | 91 | 0.201 | 0.228 | 103 | -12 |
| WAG | MAAC | 2009 | 412 | 82 | 0.199 | 0.228 | 94 | -12 |
| DET | IND | 2009 | 337 | 66 | 0.196 | 0.230 | 78 | -12 |
| AF | GW | 2009 | 588 | 126 | 0.214 | 0.233 | 137 | -11 |
| WAG | MAAC | 2010 | 423 | 89 | 0.210 | 0.235 | 99 | -10 |
| HOLY C | PAT | 2011 | 444 | 94 | 0.212 | 0.235 | 104 | -10 |
| VER | AE | 2009 | 481 | 104 | 0.216 | 0.237 | 114 | -10 |
| ST. JOE'S | CAA | 2011 | 274 | 55 | 0.201 | 0.237 | 65 | -10 |
| SJ | BE | 2010 | 507 | 111 | 0.219 | 0.238 | 121 | -10 |
| RUT | ECAC | 2009 | 532 | 117 | 0.220 | 0.238 | 127 | -10 |
All of these teams scored less than their regressed shooting percentages indicated they should've.
Here are the top ten over and under performers in opponent shooting percentage:
| Team | CONF | Yr | oSHA | oG | oppSH% | roSH% | reg GA | GAD |
|---|---|---|---|---|---|---|---|---|
| BELL | ECAC | 2011 | 363 | 179 | 0.493 | 0.415 | 151 | 28 |
| WAG | NEC | 2011 | 504 | 200 | 0.397 | 0.363 | 183 | 17 |
| MERCER | IND | 2011 | 614 | 224 | 0.365 | 0.344 | 211 | 13 |
| WAG | MAAC | 2009 | 610 | 219 | 0.359 | 0.339 | 207 | 12 |
| WAG | MAAC | 2010 | 605 | 213 | 0.352 | 0.334 | 202 | 11 |
| PENN ST. | CAA | 2010 | 477 | 163 | 0.342 | 0.323 | 154 | 9 |
| DET | MAAC | 2010 | 530 | 180 | 0.340 | 0.323 | 171 | 9 |
| PRES | IND | 2010 | 401 | 137 | 0.342 | 0.321 | 129 | 8 |
| ST. JOE'S | CAA | 2011 | 449 | 152 | 0.339 | 0.320 | 144 | 8 |
| UMBC | AE | 2010 | 400 | 136 | 0.340 | 0.320 | 128 | 8 |
| --- | ||||||||
| ND | GW | 2009 | 505 | 99 | 0.196 | 0.221 | 112 | -13 |
| UMASS | ECAC | 2009 | 587 | 119 | 0.203 | 0.224 | 131 | -12 |
| PENN ST. | ECAC | 2009 | 555 | 113 | 0.204 | 0.225 | 125 | -12 |
| FAIR | ECAC | 2011 | 569 | 129 | 0.227 | 0.242 | 138 | -9 |
| ND | BE | 2011 | 411 | 92 | 0.224 | 0.244 | 100 | -8 |
| BROWN | IVY | 2009 | 575 | 133 | 0.231 | 0.245 | 141 | -8 |
| PRIN | IVY | 2011 | 407 | 92 | 0.226 | 0.245 | 100 | -8 |
| PRIN | IVY | 2009 | 525 | 121 | 0.230 | 0.245 | 129 | -8 |
| MSM | MAAC | 2009 | 508 | 117 | 0.230 | 0.246 | 125 | -8 |
| HOF | CAA | 2011 | 456 | 105 | 0.230 | 0.247 | 112 | -7 |
Now, below are the top and bottom ten in regressed goal differential.
| Team | CONF | Yr | actGD | regGD |
|---|---|---|---|---|
| SYR | IND | 2009 | 100 | 88 |
| UVA | ACC | 2010 | 94 | 80 |
| DUKE | ACC | 2010 | 89 | 79 |
| UVA | ACC | 2009 | 80 | 77 |
| CORNELL | IVY | 2011 | 76 | 71 |
| SYR | BE | 2010 | 76 | 68 |
| ND | GW | 2009 | 77 | 67 |
| UNC | ACC | 2009 | 62 | 60 |
| CORNELL | IVY | 2009 | 64 | 58 |
| SYR | BE | 2011 | 65 | 57 |
| --- | ||||
| MERCER | IND | 2011 | -139 | -118 |
| WAG | MAAC | 2009 | -137 | -113 |
| WAG | MAAC | 2010 | -124 | -103 |
| WAG | NEC | 2011 | -119 | -96 |
| PRES | IND | 2011 | -91 | -90 |
| DET | IND | 2009 | -106 | -87 |
| PROV | BE | 2010 | -92 | -84 |
| ST. JOE'S | CAA | 2011 | -97 | -79 |
| ST. JOE'S | MAAC | 2010 | -77 | -64 |
| VMI | MAAC | 2010 | -70 | -61 |
The results themselves don't change much, but some teams do move up (like 2009 Detroit) and some teams do move down (like 2011 Syracuse).
What we can do with this data is use the regressed goals for and against to get new regressed offensive and defensive efficiencies. That will be the subject of my next fan post.
Caveats
One thing I haven't done is regressing at the extra man prowess of a team. Just a cursory look though, that doesn't explain why Stony Brook is outperforming their regressed shooting percentage by so much. Their conversion rate with the extra man has been 27.8, 39.3, and 16.7% while their opponents have converted on 35.6, 39.4, and 37.0% of their opportunities (that's working backwards from 2011 through 2009).
0 comments
|
0 recs |

by Mike Rogers on 








