With homage to D'ohboy and his Pythagorean NHL Standings, I present an entirely different approach to describing a relationship between points and goals scored/allowed by taking advantage of a mathematically-centered concept that applies specifically to hockey.
I'm going to blatantly plagiarize the format of his most recent post because I like it. For the uninclined who come to boggle over Tampa Bay and nothing else, here are tables of standings with conferences separated, sorted by expected points, actual points, and then the difference between the two:
We can talk about the Lightning all day, can't we... suffice to say goal distribution isn't necessarily everything, especially when you suffer a handful of blowouts. Anyway, there's a rather ugly Excel sheet here with a lot of fun extra stats that you can download and play with if you're interested. There's plenty to discuss with respect to the spreadsheet: I'd particularly like to point out Minnesota's overtime dominance this year: coaching philosphy, luck, both? I digress.
For the record, when comparing my expected standings to D'ohboy's with the same cutoff date of January 4th, my root mean square error is 3.42 points to D'ohboy's 3.69 points. I can't tell you if that's significant or meaningful in any way, but my goal all along was largely to minimize RMSE, which is arbitrary, I guess.
So what's different about my approach? The primary assumption I make is that goals scored are randomly distributed (i.e. follow a Poisson distribution). We all know this is generically false at the basic level: a team on a breakaway with no defenders within 15 feet has a higher chance of scoring in the next 20 seconds than the team defending that breakaway. It's also false at a granulated level: a team taking a face-off in the offensive zone generally has a better chance to score in the few seconds following the faceoff than the defensive-zone team. It's also false at a grand level: power plays, empty netters, and overtimes skew goal distribution quite dramatically.
However, the Poisson distribution does a rather okay job of emulating the typical goal distribution in NHL games. What does this afford us? Get ready for some stat talk! We now have two discrete random variables, X and Y, with Poisson distributions. X has mean A (regulation goals for per game minus empty net goals) and Y has mean B (regulation goals against per game blah blah blah). Then it's straightforward to calculate the probability that X is greater than or less than Y.
The only crisis I had was how to handle overtime, instances where X equals Y. Goals are scored in overtime at roughly 1.5x the rate of regulation, so the choice was between multiplying GF/G and GA/G by ~1.5 and dividing by 12 (translate games of sixty minute length to a five-minute OT period), and using the actual team OT goal distribution to emulate GF/OT and GA/OT. I sided with the latter because it uses actual data and (more importantly to me) it reduces RMSE (but actually increased average error, IIRC). Does Ottawa (no overtime goals) really have no chance of winning a game that ends in overtime? Certainly not, but retrospectively you can say that it would be impossible for them to win if they never scored. You can quibble with me on this, though; I encourage opinions.
Then there's some math to be done to extrapolate win percentages (info available upon request), and then straight algebra gets you to expected points (simply ((RWP+OTWP)*2+(OTLP+SOP/2))*GP where RWP, OTWP, OTLP, and SOP are expected percentages of regulation wins, OT wins, OT losses, and shootouts respectively where I assume a shootout is a coinflip, and GP is games played). For whatever reason, my expected points come in substantially below actual points (read: overtime is more likely than a Poisson distribution suggests), so I subtract an identical (negative) error term from each team and round to get the expected points.
So what do we have? Essentially the same thing with a different underlying methodology, one that is (potentially but not definitely) subject to less error. On a grand scale, I consider this a first-order approach to the expected-point problem: it solely uses actual data and does nothing to address the role of small sample size in overtime goal distributions, skill at scoring ENG (if such a thing exists?), opponent strength, outlandish save/shooting percentage, etc. But overall I think it's a(nother) step forward; I'd like to hear your opinions, too.
I again thank D'ohboy for sharing his data set with me a few weeks ago and thus inspiring this work. I will similarly make mine available to anybody who requests it. Any errors in data are hopefully the NHL's and not mine, since I like to think I'm a superb data miner and mathematician/statistician. :)