"Expected Points" per D'ohboy's Theorem
For all three of you who I haven't yet bored to death with Excel and math, after the break will be my breakdown of the current standings according to Expected Points per what I'm calling "D'ohboy's Theorem." I'll also include the same table using my original theorem that used PP, PK and 5v5.
The following table is the current NHL "Expected Points" according to "D'ohboy's Theorem," which is:Expected Points = (1.1134)*GP + Goal Differential/2.645
| 2009 | -2010 | |||||
| Team | GP | GF | GA | Diff | Pts Expected | Pts Actual |
| SJS | 53 | 179 | 128 | 51 | 78.29960579 | 78 |
| CHI | 53 | 174 | 122 | 52 | 78.67781423 | 76 |
| WSH | 53 | 207 | 146 | 61 | 82.08169014 | 76 |
| NJD | 52 | 139 | 115 | 24 | 66.97456341 | 70 |
| BUF | 52 | 147 | 126 | 21 | 65.83993811 | 69 |
| VAN | 53 | 173 | 129 | 44 | 75.65214675 | 68 |
| PIT | 55 | 173 | 156 | 17 | 67.66734827 | 67 |
| COL | 51 | 153 | 136 | 17 | 63.21368974 | 66 |
| LAK | 53 | 160 | 147 | 13 | 63.92768526 | 65 |
| PHX | 53 | 144 | 139 | 5 | 60.90201778 | 65 |
| OTT | 55 | 154 | 155 | -1 | 60.85959644 | 64 |
| NSH | 52 | 145 | 145 | 0 | 57.89756098 | 61 |
| CGY | 53 | 135 | 138 | -3 | 57.8763503 | 59 |
| DET | 53 | 137 | 143 | -6 | 56.741725 | 59 |
| DAL | 53 | 152 | 171 | -19 | 51.82501535 | 57 |
| MIN | 53 | 150 | 158 | -8 | 55.98530813 | 56 |
| PHI | 51 | 155 | 144 | 11 | 60.94443913 | 55 |
| FLA | 53 | 146 | 154 | -8 | 55.98530813 | 55 |
| NYR | 54 | 138 | 150 | -12 | 55.58588902 | 55 |
| ANA | 54 | 150 | 171 | -21 | 52.18201311 | 55 |
| MTL | 55 | 141 | 149 | -8 | 58.2121374 | 55 |
| BOS | 51 | 127 | 131 | -4 | 55.2713126 | 54 |
| ATL | 52 | 158 | 167 | -9 | 54.49368506 | 54 |
| TBL | 52 | 135 | 157 | -22 | 49.57697541 | 54 |
| STL | 53 | 139 | 149 | -10 | 55.22889126 | 54 |
| NYI | 54 | 142 | 168 | -26 | 50.29097094 | 54 |
| CBJ | 56 | 146 | 186 | -40 | 47.22288211 | 51 |
| TOR | 54 | 142 | 187 | -45 | 43.10501067 | 44 |
| CAR | 53 | 141 | 174 | -33 | 46.53009726 | 43 |
| EDM | 51 | 135 | 176 | -41 | 41.27760051 | 38 |
The following is a rating system based on the following equation: (PP+PK)^2+3*(5v5^2):
| 2009 | -2010 | |||||
| Team | Points | PP | PK | ST | 5v5 | |
| SJS | 78 | 0.216 | 0.878 | 1.094 | 1.37 | 6.827536 |
| CHI | 76 | 0.213 | 0.856 | 1.069 | 1.28 | 6.057961 |
| WSH | 76 | 0.261 | 0.803 | 1.064 | 1.56 | 8.432896 |
| NJD | 70 | 0.192 | 0.826 | 1.018 | 1.08 | 4.535524 |
| BUF | 69 | 0.183 | 0.871 | 1.054 | 1.09 | 4.675216 |
| VAN | 68 | 0.22 | 0.816 | 1.036 | 1.39 | 6.869596 |
| PIT | 67 | 0.162 | 0.829 | 0.991 | 1.15 | 4.949581 |
| COL | 66 | 0.186 | 0.817 | 1.003 | 1.16 | 5.042809 |
| PHX | 65 | 0.171 | 0.825 | 0.996 | 1.12 | 4.755216 |
| LAK | 65 | 0.185 | 0.796 | 0.981 | 1 | 3.962361 |
| OTT | 64 | 0.149 | 0.84 | 0.989 | 0.93 | 3.572821 |
| NSH | 61 | 0.158 | 0.773 | 0.931 | 1.01 | 3.927061 |
| DET | 59 | 0.174 | 0.804 | 0.978 | 0.92 | 3.495684 |
| CGY | 59 | 0.167 | 0.832 | 0.999 | 0.98 | 3.879201 |
| DAL | 57 | 0.186 | 0.753 | 0.939 | 0.93 | 3.476421 |
| MIN | 56 | 0.175 | 0.828 | 1.003 | 0.94 | 3.656809 |
| ATL | 56 | 0.175 | 0.802 | 0.977 | 0.97 | 3.777229 |
| MTL | 55 | 0.247 | 0.842 | 1.089 | 0.79 | 3.058221 |
| NYR | 55 | 0.181 | 0.845 | 1.026 | 0.97 | 3.875376 |
| ANA | 55 | 0.186 | 0.801 | 0.987 | 0.94 | 3.624969 |
| FLA | 55 | 0.168 | 0.805 | 0.973 | 0.96 | 3.711529 |
| PHI | 55 | 0.231 | 0.798 | 1.029 | 1.06 | 4.429641 |
| NYI | 54 | 0.163 | 0.75 | 0.913 | 0.83 | 2.900269 |
| STL | 53 | 0.162 | 0.849 | 1.011 | 0.95 | 3.729621 |
| TBL | 54 | 0.195 | 0.799 | 0.994 | 0.86 | 3.206836 |
| BOS | 54 | 0.174 | 0.874 | 1.048 | 0.89 | 3.474604 |
| CBJ | 51 | 0.204 | 0.821 | 1.025 | 0.72 | 2.605825 |
| TOR | 44 | 0.165 | 0.694 | 0.859 | 0.96 | 3.502681 |
| CAR | 43 | 0.181 | 0.794 | 0.975 | 0.81 | 2.918925 |
| EDM | 38 | 0.192 | 0.739 | 0.931 | 0.82 | 2.883961 |
If this FanPost is written by someone other than one of the blog's editors, the opinions expressed in it do not necessarily reflect those of this blog or SB Nation.
24 comments
|
2 recs |
Do you like this story?
Comments
It definitely does work pretty damn well, despite the kludge. It does make me a little nervous to see that (with the exception of VAN) the teams that are over-performing are the teams that have rocks in net and the teams that are under-performing are the teams that rely on their offense. I don’t know if a team’s ability to win close games is a factor, or if it’s a coincidence, or what. But despite the solid goaltending the Caps have gotten this year I don’t think anyone would put us in the “rock in net” category.
Now let's say you and I go toe to toe on bird law and see who comes out the victor.
Agreed
One of the criticisms of Bill James’ original Pythagorean Theorem for baseball was that it undervalued teams that could repeatedly win close games – like teams with exceptional bullpens for example.
I think we’re seeing something similar – teams like NJ and BUF that win close games on the back of their goaltending get penalized here, whereas teams like the Caps, Sharks and ’Hawks get a lift.
I’m going to fiddle with the second equation, perhaps add in Penalties For/Against and I’ll use it as a “Power Ranking.”
This is not a game of who the f*ck are you...
Now we’re talking business! One point: you’ve probably noticed that the average factor of 2.645 actually monotonically decreases over last four seasons — 2.85, 2.78, 2.62, 2.32. Maybe it would be reasonable to take the previous season number as it might better reflect the CURRENT state of affairs.
I’m still not too comfortable with (PP+PK)^2. This parameter is almost constant for all teams and therefore the ratings difference is in fact just the difference between 3*5v5.
In any case, excellent job. Both “this” and “rec’d”.
I actually didn’t notice that. Why do you think that is? I wonder if more games are going to OT/SO and this is affecting it…
Hmm. I squared the numbers to “reward” those teams that managed to get over 1, while punishing those under 1. I thought that this would further differentiate between good teams and bad teams and lead to a greater spread in the curve.
As I develop it more, I’m going to figure out exactly what the post-lockout ratio of PP goals to EV goals is league-wide, then I’ll use that as the multiplier “X” for the 5v5 number.
Actually, here’s the equation I’m thinking of doing as a “Power Rating:” (Minor Penalties Drawn/Minor Penalties Against * (PP+PK))^2 + "X"*(5v5^2)
This is not a game of who the f*ck are you...
Why do you think that is? I wonder if more games are going to OT/SO and this is affecting it…
I think as scoring has gone down since the lockout each goal has been more valuable so a smaller goal differential makes a bigger difference.
Now let's say you and I go toe to toe on bird law and see who comes out the victor.
You can actually rewrite the formula as the following:
Points Percentage = (1/2)((1.1134) + (Goals For Avg – Goals Against Avg)/2.645)
What’s interesting about writing it this way is that it means that points percentage should scale linearly with the difference between the average goals for and average goals against, or your average goal differential. If you’re blowing a lot of teams out of the water, but losing a significant number of one goal games, then your team is probably underperforming according to this formula. Likewise, if you’re squeaking by on getting a lot of OT losses (which are by definition one-goal games) and a handful of one goal wins, but losing in blowouts every other night, your team is probably overperforming.
GUTEN TAAAAAAAAAAAAG!
I’m not being flippant or defensive, but why would you want to know points percentage? Ultimately, the important thing is to know the value in points of a given goal for or goal against.
Is the preference for points percentage because that’s how James’ formula works?
I guess I’m just trying to understand why you’d want to do it this way and it’s not coming to me… could you elaborate a bit?
(Again, remember that I’m somewhat obtuse when it comes to statistics – I have no training whatsoever, so whatever I’m doing I’m not standing on the shoulders of giants, I’m burrowing around with the moles.)
This is not a game of who the f*ck are you...
Well, it’s another way of analyzing the data. Dividing through by the number of games played normalizes the points for each team across the standings regardless of the number of games they’ve played. If you’re interested in whether a team is under- or over-performing despite their great/horrible offense/defense, . Using the total number of points is kind of a problematic yardstick in the middle of the regular season because it doesn’t account for the number of games-in-hand.
Another useful metric would be Expected Points Per Game, which would just be total points divided by number of games played. Rewriting the formula in this fashion gives the following:
If you’re trying to know the value in points of a given goal for or goal against, then I suppose the best way to write the formula would be as follows:
Expected Points Per Game = 1.1134 + (Goals For – Goals Against)/(GP*2.645) = 1.1134 + (Goals For Avg – Goals Against Avg)/2.645
This gives the value of a particular goal for or goal against in terms of standing points. The only problem is that it highly overvalues goals early in the season and devalues them late in the season.
GUTEN TAAAAAAAAAAAAG!
Ok. I’ve got it now. I think what was confusing me was that you multiplied the 1.1134 by .5, which threw me.
I guess I went for Expected Points because it translates well to what we already know, but I can see how your method would have value in other applications.
This is not a game of who the f*ck are you...
Great job!
I was charting out much of the same teams information from BTN. The thing that I struggled with and wondered if it was an X-factor that could serve as a pulse and trending stat was penalties drawn-penalties taken.
The penalties calculation and blocked shots help the most in the post season and I was trying to figure out how to fit them all together, any ideas from the more statistically inclined?
Promote the game, it's the NHL, not SCHL
Just eyeballing the comparison between PE and actual points it looks like you have an very high correlation. I’d be curious to know exactly what the correlation coefficient is (you should be able to get that pretty easily in Excel).
Knowing the exact strength of the correlation will be useful as you attempt to fine-tune the formula (keeping in mind that very small changes in correlation strength are probably not meaningful).
The correlation between the two is .9439. Pretty damn good.
Of all our iniquities ignorance may be the worst
by Killer_Carlson on Jan 29, 2010 4:05 PM EST up reply actions
That’s so high that I’d almost be afraid to try and get it higher. After all, if the correlation was a perfect 1.0, it would tell us nothing we didn’t already know. Anyone can look at the actual standings. The point of an exercise such as this is to indicate teams which are perhaps over- or under-rated by the actual standings, to give a measure of “true” team strength and to predict performance in the future.
The best measure of how good this formula is would not be to see how well it compares to standings points over the same period, it would be to see how well the formula’s prediction based on the first-half stats compares to the actual standings from the second half. If the formula is better than actual first-half standings points at predicting second-half standings points, then you know you accomplished something. You can determine how well it predicts future performance using correlation coefficients.
Very Minor Quibble: SJS have 79 pts on account they got a charity point for taking the Hawks to overtime last night. Thanks, Gary!
IS KEPTIN NOW
This is very impressive. Very impressive.
The only thing I might add is a variable for very strong defensive systems or goalie play – perhaps a save pct modifier? It might help account for the performance of BUF and NJD. But I think you’ve got the basic parameters nailed down.
The rest of the hockey world ought to be sitting up and taking notice – this is not insignificant.
"You're gonna eat that g**d**n Koho, three!"
Thanks!
This has taken longer than I thought, but it’s been a labor of love.
Here’s the thing about NJD and BUF – I think that it’s up to the reader/analyst/whomever to understand that teams that can repeatedly win close games are going to appear to overperform. Baseball has the same issue with teams that have exceptional bullpens – they have an ability to repeatedly win games by a run or two, but if they get blown out the bullpen never really factors.
In the process of doing this, I’ve also developed a pretty sweet “Power Ranking” that I’ll debut sometime next week. But what’s really cool is what this equation can allow us to do in the future for player evaluation. Are you following me?
;)
This is not a game of who the f*ck are you...
Doh, I just wanted to say that I dig your past three FanPosts and the effort you put in. I spent the whole weekend sick and hopped up on various cold medicines, so I wasn’t able to contribute in any intelligent way, but I just want to let you know I appreciate the work. Keep ’em coming.
"Dozens of people spontaneously combust each year. It's just not really widely reported."
by Laich It Or Lump It on Feb 2, 2010 12:15 PM EST reply actions

by 






























