Pythagorean Theorem for Hockey: Part D'oh
After our discussion the other day, I went back to the drawing board to come up with a Pythagorean Theorem for hockey. Beware, for below the jump, there be graphs and math!
I thought about it for a while and, while I liked my PP/PK/5v5 method, several people asked for something more akin to James' original formulation that takes Runs For and Runs Against and spits out a winning percentage. However, we all quickly realized why this doesn't work for hockey - baseball has binary results (wins or losses), but hockey doesn't.After pondering how to square this circle for a bit, I had an epiphany: what we really want to know is point totals, not winning percentage!
Armed with this realization, the NHL.com website, Excel and some patience, I devised the following method:
- Figure out how many points the average team acquires in a given year (it turns out to be right around 91.3 every year since the lockout);
- Figure out the average goal differential (surprise, surprise. . . it's ZERO!!!);
- Figure out the average goal differential per point both positive and negative (it averages out to about 2.65). This was a bit of a kludge and deserves explanation because someone with an actual background in statistics could probably come up with a much better means of doing this. To get this number, I took the top and bottom team in the league by points in each year and figured out the absolute value of how far they deviated from the mean in both points and goals differential. Then I divided the goal differential number by the points number and averaged the top and bottom number. So for example, last year, SJS was the top team in the league with 117 points and a goal differential of 53, while NYI was the bottom team in the league with 61 points and a -78 differential. The average number of points in the NHL last year was 91.4. For SJS: 117-91.4=25.6 and 53 Diff/25.6=2.07 Gdiff/point. For NYI: 91.4-61=30.4 and 78 Diff/30.4=2.5658 GDiff/point;
- Average all the data over the four full years since the lockout and do some basic algebra and...
I came up with the following:
- Over a full 82-game season a team's expected points PE = 91.3 + Differential/2.644
- Mid-season, you need to modify the equation like so: PE = (91.3/82)*Games Played + Differential/2.644
Again, this only works for post-lockout rules. I'm guessing that pre-lockout data would be easy enough to collect and that would allow for a much larger sample size.
I'm going to post the spreadsheets below so that others can check my work and I'll apply the theorem to this year's standings in a following post:
| 2005 | -2006 | |||
| Team | Points | GF | GA | Differential |
| DET | 124 | 305 | 209 | 96 |
| OTT | 113 | 314 | 211 | 103 |
| DAL | 112 | 265 | 218 | 47 |
| CAR | 112 | 294 | 260 | 34 |
| BUF | 110 | 281 | 239 | 42 |
| NSH | 106 | 259 | 227 | 32 |
| CGY | 103 | 218 | 200 | 18 |
| NJD | 101 | 242 | 229 | 13 |
| PHI | 101 | 267 | 259 | 8 |
| NYR | 100 | 257 | 215 | 42 |
| SJS | 99 | 266 | 242 | 24 |
| ANA | 98 | 254 | 229 | 25 |
| COL | 95 | 283 | 257 | 26 |
| EDM | 95 | 256 | 251 | 5 |
| MTL | 93 | 243 | 247 | -4 |
| TBL | 92 | 252 | 260 | -8 |
| VAN | 92 | 256 | 255 | 1 |
| TOR | 90 | 257 | 270 | -13 |
| ATL | 90 | 281 | 275 | 6 |
| LAK | 89 | 249 | 270 | -21 |
| FLA | 85 | 240 | 257 | -17 |
| MIN | 84 | 231 | 215 | 16 |
| PHX | 81 | 246 | 271 | -25 |
| NYI | 78 | 230 | 278 | -48 |
| CBJ | 74 | 223 | 279 | -56 |
| BOS | 74 | 230 | 266 | -36 |
| WSH | 70 | 237 | 306 | -69 |
| CHI | 65 | 211 | 285 | -74 |
| PIT | 58 | 244 | 316 | -72 |
| STL | 57 | 197 | 292 | -95 |
| 91.36666667 | 0 | |||
| Top | 32.63333333 | 96 | 2.941777324 | |
| Bottom | 34.36666667 | 95 | 2.764306499 | |
| 2.853041911 |
| 2006 | -2007 | |||
| Team | Points | GF | GA | Differential |
| BUF | 113 | 308 | 242 | 66 |
| DET | 113 | 254 | 199 | 55 |
| NSH | 110 | 272 | 212 | 60 |
| ANA | 110 | 258 | 208 | 50 |
| SJS | 107 | 258 | 199 | 59 |
| DAL | 107 | 226 | 197 | 29 |
| NJD | 107 | 216 | 201 | 15 |
| VAN | 105 | 222 | 201 | 21 |
| OTT | 105 | 288 | 222 | 66 |
| PIT | 105 | 277 | 246 | 31 |
| MIN | 104 | 235 | 191 | 44 |
| ATL | 97 | 246 | 245 | 1 |
| CGY | 96 | 258 | 226 | 32 |
| COL | 95 | 272 | 251 | 21 |
| NYR | 94 | 242 | 216 | 26 |
| TBL | 93 | 253 | 261 | -8 |
| NYI | 92 | 248 | 240 | 8 |
| TOR | 91 | 258 | 269 | -11 |
| MTL | 90 | 245 | 256 | -11 |
| CAR | 88 | 241 | 253 | -12 |
| FLA | 86 | 247 | 257 | -10 |
| STL | 81 | 214 | 254 | -40 |
| BOS | 76 | 219 | 289 | -70 |
| CBJ | 73 | 201 | 249 | -48 |
| EDM | 71 | 195 | 248 | -53 |
| CHI | 71 | 201 | 258 | -57 |
| WSH | 70 | 235 | 286 | -51 |
| LAK | 68 | 227 | 283 | -56 |
| PHX | 67 | 216 | 284 | -68 |
| PHI | 56 | 214 | 303 | -89 |
| 91.36666667 | 0 | |||
| Top | 21.63333333 | 66 | 3.050847458 | |
| Bottom | 35.36666667 | 89 | 2.516493874 | |
| 2.783670666 |
| 2007 | -2008 | |||
| Team | Points | GF | GA | Differential |
| DET | 115 | 257 | 184 | 73 |
| SJS | 108 | 222 | 193 | 29 |
| MTL | 104 | 262 | 222 | 40 |
| PIT | 102 | 247 | 216 | 31 |
| ANA | 102 | 205 | 191 | 14 |
| NJD | 99 | 206 | 197 | 9 |
| MIN | 98 | 223 | 218 | 5 |
| DAL | 97 | 242 | 207 | 35 |
| NYR | 97 | 213 | 199 | 14 |
| COL | 95 | 231 | 219 | 12 |
| PHI | 95 | 248 | 233 | 15 |
| WSH | 94 | 242 | 231 | 11 |
| OTT | 94 | 261 | 247 | 14 |
| CGY | 94 | 229 | 227 | 2 |
| BOS | 94 | 212 | 222 | -10 |
| CAR | 92 | 252 | 249 | 3 |
| NSH | 91 | 230 | 229 | 1 |
| BUF | 90 | 255 | 242 | 13 |
| EDM | 88 | 235 | 251 | -16 |
| CHI | 88 | 239 | 235 | 4 |
| VAN | 88 | 213 | 215 | -2 |
| FLA | 85 | 216 | 226 | -10 |
| PHX | 83 | 214 | 231 | -17 |
| TOR | 83 | 231 | 260 | -29 |
| CBJ | 80 | 193 | 218 | -25 |
| NYI | 79 | 194 | 243 | -49 |
| STL | 79 | 205 | 237 | -32 |
| ATL | 76 | 216 | 272 | -56 |
| LAK | 71 | 231 | 266 | -35 |
| TBL | 71 | 223 | 267 | -44 |
| 91.06666667 | 0 | |||
| Top | 23.93333333 | 73 | 3.050139276 | |
| Bottom | 20.06666667 | 44 | 2.19269103 | |
| 2.621415153 |
| 2008 | -2009 | |||
| Team | Points | GF | GA | Differential |
| SJS | 117 | 257 | 204 | 53 |
| BOS | 116 | 274 | 196 | 78 |
| DET | 112 | 295 | 244 | 51 |
| WSH | 108 | 272 | 245 | 27 |
| NJD | 106 | 244 | 209 | 35 |
| CHI | 104 | 264 | 216 | 48 |
| VAN | 100 | 246 | 220 | 26 |
| PIT* | 99 | 264 | 239 | 25 |
| PHI* | 99 | 264 | 238 | 26 |
| CGY | 98 | 254 | 248 | 6 |
| CAR | 97 | 239 | 226 | 13 |
| NYR | 95 | 210 | 218 | -8.00 |
| MTL | 93 | 249 | 247 | 2 |
| FLA | 93 | 234 | 231 | 3 |
| STL | 92 | 233 | 233 | 0 |
| CBJ | 92 | 226 | 230 | -4 |
| ANA | 91 | 245 | 238 | 7 |
| BUF | 91 | 250 | 234 | 16 |
| MIN | 89 | 219 | 200 | 19 |
| NSH | 88 | 213 | 233 | -20 |
| EDM | 85 | 234 | 248 | -14 |
| DAL | 83 | 230 | 257 | -27 |
| OTT | 83 | 217 | 237 | -20 |
| TOR | 81 | 250 | 293 | -43 |
| PHX | 79 | 208 | 252 | -44 |
| LAK | 79 | 207 | 234 | -27 |
| ATL | 76 | 257 | 280 | -23 |
| COL | 69 | 199 | 257 | -58 |
| TBL | 66 | 210 | 279 | -69 |
| NYI | 61 | 201 | 279 | -78 |
| 91.4 | 0 | |||
| Top | 25.6 | 53 | 2.0703125 | |
| Bottom | 30.4 | 78 | 2.565789474 | |
| 2.318050987 |
Four Year Average:
| Avg. Pts | Avg. GDiff/Pt |
| 91.3 | 2.644044679 |
If this FanPost is written by someone other than one of the blog's editors, the opinions expressed in it do not necessarily reflect those of this blog or SB Nation.
27 comments
|
4 recs |
Do you like this story?
Comments
Reading this, I’m realizing just how much I hate math. Credit to you for doing this legwork. Holy crap.
Familiar Rapports: Bald Pollack, F&B, Gould Old Days.
Lobbies: Osala, Perreault, Erskine, Pothier, Neuvirth, Flash.
Fan of: Mean Lars Backstrom, Line Mashing, Cake.
Reading this, I’m realizing how much I LOVE math. Haha :) Great job D’ohboy – very interesting post.
"No Brooks Laich, no win. Know Brooks Laich, know win."
by kellobellow on Jan 28, 2010 11:04 PM EST up reply actions
What’s weird is that I used to hate math. And now I’m doing stuff like this in my spare time. I was so absorbed in this I forgot to eat dinner. :)
This is not a game of who the f*ck are you...
It happens – a lot of times, teachers, middle/high school math classes, etc. can discourage people from liking math. (Part of why I do what I do!) It takes finding something you are truly interested in to discover what math can really do.
"No Brooks Laich, no win. Know Brooks Laich, know win."
I had an epiphany: what we really want to know is point totals, not winning percentage!
Please tell me you lurked my most recent run in with MarioD.
Great work and great effort. I dislike math enough that I would never venture to do this. My reservation is that you only picked the top and bottom team to determine the goal differential per point. It seems like you should be using the data for each team but I’m not exactly sure how you would do it.
Now let's say you and I go toe to toe on bird law and see who comes out the victor.
No, what happened?
I’m also not happy with the kludge, but it was a fast and dirty way of establishing a good deviation from the mean. If I were someone with actual statistician, I’d figure out how to do an actual measurement of deviation, then I’d use that as the multiplier. If you read the following post, though, you’ll see that it actually works quite well. Honestly, I’m actually a little shocked that it worked as well as it did.
This is not a game of who the f*ck are you...
Points percentage is win percentage. Idiot.
Now let's say you and I go toe to toe on bird law and see who comes out the victor.
I’m guessing you’re talking about MarioD, since I was using points totals. . .
This is not a game of who the f*ck are you...
Yeah, he spent a lot of time and spewed a lot of venom because we ignorant Caps fans could not understand the simple concept that points are imaginary and meant to be a proxy for win percentage, and that NHL standings in fact are determined by win percentage.
Now let's say you and I go toe to toe on bird law and see who comes out the victor.
Then he’s not just a moron, he’s an ignorant moron. Where was this? I’d love to read it.
This is not a game of who the f*ck are you...
start here and scroll down at your own risk.
by Natty Bumppo on Jan 29, 2010 10:58 AM EST up reply actions
Great work. It’s pretty impressive how consistent those numbers are, so it would definitely be interesting to see what it’s like for the whole league.
You should be able to calculate the standard deviation with an excel formula. The longer but still easy way to do it would just be to take the sum of all the deviations from the mean and divide it by the number of teams (30).
Of all our iniquities ignorance may be the worst
by Killer_Carlson on Jan 29, 2010 2:04 AM EST up reply actions
I’m guessing that the formula is something like =SDEV, right?
I’ll try it out and see what I get. s
This is not a game of who the f*ck are you...
I don’t usually use it, but it is something like that. You should be able to find it easily from the pull down menus, or if all else fails do a help search for standard deviation.
Of all our iniquities ignorance may be the worst
by Killer_Carlson on Jan 29, 2010 3:46 PM EST up reply actions
Here’s how you do it: plot all differentials, make a linear fit, and the slope will give you what you’re looking for.
By: “make a linear fit” do you mean “just eyeball it”?
This is not a game of who the f*ck are you...
No, there’s free soft (e.g., xmgrace for linux) that can do any kind of fit. The result, however, would be just marginally different from your average.
Yeah. I was guessing that, by getting the two “endpoints” of the line, the average would more or less be the slope.
This is not a game of who the f*ck are you...
It might be close, but there’s a statistical way to find the actual line of best fit… I don’t know how to on any computer software, but I know how on a TI-83, haha :)
"No Brooks Laich, no win. Know Brooks Laich, know win."
B=(X’X)^(-1)X’y in my opinion. That’d be a the simple linear regression, but I’m not sure there shouldn’t be some weighting for other factors, like sv%, but that’s not getting added without a third dimension.
Not computationally efficient, but we’re not dealing with large variable sets, either.
Only YOU can prevent idiots from commenting!
by Knee high to a duck on Jan 29, 2010 6:09 PM EST up reply actions
Its probably close, but we can always help you to get just a bit closer.
Of course, at some level, the precision starts to matter less and less since a) you get into fractional points and b) its a hockey game, so bad bounces, bad calls, and fluke plays are going to undermine the value of additional precision at some point.
Still fun to ponder… maybe I’ll dig a bit deeper Sunday and finally get some use from the half dozen math textbooks I still have from college.
by iwearstripes on Jan 29, 2010 5:39 PM EST up reply actions
sorry, this comment is off the subject.
Can you please tell me how to include excel tables in fanposts? I’ve been working on one for some time but just can’t figure out how to get my tables in it. I thought it was impossible, and your post proved me otherwise.
I tried to send a private message, but for the life of me, couldn’t find how to do that either…
thanks for any help.
I just copied the table and posted it into the text of the post….
…and right about now you’re like “f*$& you a$$hole I tried that already and it didn’t work!!!”
I’m guessing the problem you had was that your table showed up with all sorts of formatting gobbledygook above and below the table. To solve this problem, you need to preview your post and find out exactly where the formatting text is within the post. Then go back to the editing page, switch over to HTML view and find the offending formatting text and delete it (it’s usually bracketed by something like “start fragment” and “end fragment”). Then just keep going back and forth from preview to HTML editor view until it’s all gone. It takes a couple of minutes per table.
If you’re having problems copying and pasting the table itself, I recommend copying it into a MS Word doc, then using the “Paste from MS Word” function in the editor page.
Good luck!
This is not a game of who the f*ck are you...

by 






























