First Attempt at Hockey Pythagorean Theorem
After the jump is my first attempt at creating a "Pythagorean Theorem" for hockey.
Here's the "theorem:" (PP% + PK%) + (3*5v5 Ratio) = Total
The sum of the special teams percentages accounts for teams. . . special teams play (duh). I originally tried multiplying this by the 5v5 ratio, but I decided I didn't like that. I then eyeballed the ratio of 5v5 to PP goals and it is, on average, about 3, hence the reason that I weight 5v5 by 3.
As you can see below, it matches points totals pretty well. Moreover, the "outliers" are pretty much expected. NSH, OTT and MTL are over-performing, while ANA, PHI, NYR are under-performing.
Feedback is most welcome, I just wanted to get the ball rolling.
|
Team |
Points |
PP |
PK |
Sum |
5v5 |
Total |
|
SJS |
78 |
0.216 |
0.878 |
1.094 |
1.37 |
5.204 |
|
CHI |
74 |
0.21 |
0.856 |
1.066 |
1.27 |
4.876 |
|
WSH |
72 |
0.26 |
0.801 |
1.061 |
1.48 |
5.501 |
|
NJD |
69 |
0.194 |
0.822 |
1.016 |
1.12 |
4.376 |
|
PIT |
67 |
0.164 |
0.827 |
0.991 |
1.15 |
4.441 |
|
BUF |
67 |
0.184 |
0.87 |
1.054 |
1.09 |
4.324 |
|
VAN |
66 |
0.22 |
0.819 |
1.039 |
1.38 |
5.179 |
|
COL |
66 |
0.186 |
0.817 |
1.003 |
1.16 |
4.483 |
|
PHX |
63 |
0.165 |
0.827 |
0.992 |
1.14 |
4.412 |
|
LAK |
61 |
0.18 |
0.792 |
0.972 |
1 |
3.972 |
|
NSH |
61 |
0.161 |
0.774 |
0.935 |
1.01 |
3.965 |
|
OTT |
60 |
0.153 |
0.837 |
0.99 |
0.9 |
3.69 |
|
CGY |
58 |
0.159 |
0.832 |
0.991 |
0.99 |
3.961 |
|
DET |
58 |
0.174 |
0.815 |
0.989 |
0.91 |
3.719 |
|
MTL |
55 |
0.247 |
0.843 |
1.09 |
0.82 |
3.55 |
|
NYR |
55 |
0.185 |
0.847 |
1.032 |
1 |
4.032 |
|
DAL |
55 |
0.185 |
0.759 |
0.944 |
0.92 |
3.704 |
|
ANA |
55 |
0.192 |
0.802 |
0.994 |
0.97 |
3.904 |
|
PHI |
55 |
0.234 |
0.794 |
1.028 |
1.06 |
4.208 |
|
NYI |
54 |
0.164 |
0.754 |
0.918 |
0.87 |
3.528 |
|
MIN |
54 |
0.17 |
0.826 |
0.996 |
0.94 |
3.816 |
|
STL |
54 |
0.159 |
0.85 |
1.009 |
0.96 |
3.889 |
|
BOS |
54 |
0.174 |
0.874 |
1.048 |
0.89 |
3.718 |
|
FLA |
53 |
0.169 |
0.81 |
0.979 |
0.95 |
3.829 |
|
ATL |
52 |
0.177 |
0.795 |
0.972 |
0.97 |
3.882 |
|
TBL |
52 |
0.193 |
0.796 |
0.989 |
0.84 |
3.509 |
|
CBJ |
49 |
0.206 |
0.82 |
1.026 |
0.71 |
3.156 |
|
TOR |
44 |
0.166 |
0.696 |
0.862 |
0.96 |
3.742 |
|
CAR |
39 |
0.178 |
0.788 |
0.966 |
0.78 |
3.306 |
|
EDM |
38 |
0.192 |
0.741 |
0.933 |
0.83 |
3.423 |
If this FanPost is written by someone other than one of the blog's editors, the opinions expressed in it do not necessarily reflect those of this blog or SB Nation.
24 comments
|
6 recs |
Do you like this story?
Comments
The problem here is that the first sum varies within very small range near 1 while the 3*5v5 is much larger and therefore the difference in the result comes primarily from 3*5v5. What about 5v5 +(PK-PP)? For instance, this makes SJS 2.032, CHI 1.916, WSH 2.021 and, at the bottom, EDM 1.379, CAR 1.39, TOR 1.49, CBJ 1.324.
My only concern is that this doesn’t account for how much more important 5v5 play is than special teams. Take Montreal (please):
The sum of their PP and PK is 1.09 (2nd only to SJS), but they’re pretty awful 5v5 (.82).
If we simply summed it, they’d come out to 1.91 – which is almost as good as CHI, and that’s clearly not right.
This is not a game of who the f*ck are you...
Well, I did a fit on the points data with (3*5v5+PP+PK)A and (5v5+PP-PK)A, where A is the fitting parameter, and the difference is only marginal. Here
The first thing I thought when I saw this is well then what would the new ranks be. So:
- WSH 5.501
- SJS 5.204
- VAN 5.179
- CHI 4.876
- COL 4.483
- PIT 4.441
- PHX 4.412
- NJD 4.376
- BUF 4.324
- PHI 4.208
- NYR 4.032
- LAK 3.972
- NSH 3.965
- CGY 3.961
- ANA 3.904
- STL 3.889
- ATL 3.882
- FLA 3.829
- MIN 3.816
- TOR 3.742
- DET 3.719
- BOS 3.718
- DAL 3.704
- OTT 3.69
- MTL 3.55
- NYI 3.528
- TBL 3.509
- EDM 3.423
- CAR 3.306
- CBJ 3.156
I used to work in a fire hydrant factory. You couldn't park anywhere near the place.
What this really calls for is a regression analysis.
And with that comment, I’ve exhausted my expertise on how to actually, you know, do it.
Atta dinnin stick a who!
by Gould Old Days on Jan 27, 2010 12:50 AM EST reply actions
Great Stuff!! Thanks.
The anomoly that sticks out to me is Vancouver. 12 points behind SJ, but only .025 behind based upon your theorem. My first thought is that Vancouver gives up bunches of goals and defensively are not as good as SJ. I do not know if it means anything, but based upon the theorem itself Vancouver should be #3 in the overall standings. Why aren’t they?
Just at first glance, and not actually looking to see if either of these apply to Vancouver, two things that could make a team underperform compared to their ranking here would be a) a poor penalties-for/penalties-against ratio and b) an abnormally poor record in close games.
by sixsevenfiftysix on Jan 27, 2010 11:19 AM EST up reply actions
You can have anomalous results in a sample size like we’re talking about.
Gouldie is right, the thing to do here is compare the theory to past years’ results and do a regression analysis to see how it performs in terms of predictive power. I last did a regression analysis in the early ‘90s so I’m going to hope that LIOLI or one of the math guys/gals here will take up that baton.
"You're gonna eat that g**d**n Koho, three!"
Dude, you freaking rock
Nice work. I messed around with shot differentials last night to see if I could get something to work, and nothing ended up patterning nicely. I used last year’s results, to have a full year’s worth of data.
I also am working at stripping out OT points and going back to the W/L/T point distribution, just so there’s only 2 points awarded in a game.
I think this is a great start.
"You're gonna eat that g**d**n Koho, three!"
Thanks. I started working with last year’s data (for the reason you noted) after I posted this last night, I also changed the formula a bit to give it more differentiation between “Good” and “Bad” teams.
In the new equation, PP+PK are referred to as ST:
(ST^2+3*(5v5^2))^.5
If that’s not clear, it ends up being the square root of (Special Teams squared plus three times Five on Five squared).
I am definitely in agreement with the regression stuff.
This is not a game of who the f*ck are you...
I like stuff like this; I will help you when I get time. There definitely needs to be some regression analysis run against other standard forms of this theorem and compared to years past (since the lockout, in reality) before we can declare any sort of progress. I like the concept of including PP and PK, though.
Definitely a great start, thanks!
I agree with all the comments that regression is necessary.
I will say though, that I think the PP% and PK% need to be adjusted for number of penalties. Some teams cause a lot and take very few (Devils, anyone?), so even if their special teams combined percentage isn’t that high, their weighted percentage would be and I think that should be taken into account in an explanatory model.
One more thought: Baseball’s Pythag is based solely on runs scored and runs allowed. Could we do a hockey pythag based only on goals scored and goals allowed?
Atta dinnin stick a who!
How so? Did you manage Step 1? Did the half-wins mess it all up?
Atta dinnin stick a who!
by Gould Old Days on Jan 27, 2010 4:45 PM EST up reply actions
Nah, it just came out funny.
There’s also a big issue with all these things – empty netters.
This is not a game of who the f*ck are you...
If you scale the pythag percentage given by goals scored and goals allowed by the average points per game apparently it comes out well. Empty netters are an issue, but probably not a big one.
Right, but when you take into account the “extra” shootout goal and all the empty-netters, you can get some weird stuff.
I’m coming at this from the other direction – James once wrote about the “three true outcomes.” To me, there are “three true things” that all hockey teams do: 5v5, Power Play and Penalty Kill. (I may add Penalties For/Against in version 3.0.)
Rather than try to deduce it just from outputs (GF/GA), I thought I’d try it from the inputs. What I should do is track both then see how each one relates to points. The problem is that I’m far from a trained statistician. Frankly, I’m surprised that my idea tracked so closely with wins. :)
This is not a game of who the f*ck are you...
The one issue I have with all these special teams indices is that by using PP/PK% you’re not accounting for the frequency of the PP or PK for each team. Some teams will be on the PK or PP more often (and in this case might be worth a < 3x multiplier to 5v5).
I would think a better measure would be net special teams goals (PP+SHG-PPA-SHGA) in some fashion, as that measures what you’re looking for instead of a rate.
Also, the simple James’ pythag I just ran compared favorably to current at R^2 of .9.
-d

by 


































