Let me start out by saying that I know this is a freaking novel. But it’s the middle of the offseason, and what else have we got to do? And I think the issue is important. Even if you just skim this fanpost, I’d like to know what you think, so please vote in the poll and give comments.
I believe hockey is entering a statistical renaissance. We have more tools to describe a player’s performance now than we ever have before. But even the most basic of rate stats in hockey are not yet uniform. We have statistics that are measured by the game; by the minute; by sixty minutes; by twenty minutes; by the shift. It’s a mess.
We have an opportunity now to choose how we will describe the game into the future. And I think we should choose the most intuitive statistics. The ones that just make the most sense when you first run into them. The ones that are the easiest to explain to other people. In this fanpost, I propose that hockey adopt "per shift" as the standard stat.
To see what I’m getting at, it’s useful first to take a digression into the most stat-focused of the major sports…
Batting Average should be the best stat in sports. It is one of the oldest stats that isn’t just a simple counting number, like Hits or Home Runs are. It’s also one of the most intuitive. If a guy hits .250, then you can expect him to get a hit in one out of every four times up*. Even little kids can get a handle on batting average.
Batting Average should be the best stat in sports. But it gets a major asterisk. (See the asterisk up there? Ford Frick would be proud.) Baseball got it wrong – they made the denominator for Batting Average "at bats."
I believe most of us have had the experience of explaining to someone why "at bats" isn’t what you’d think it is. We say "No, what you’re thinking of is ‘plate appearances,’ not ‘at bats,’" and they look at us like we just grew a second head. And they’re right to be confused. Because Batting Average is broken. The mistake even propagated to Slugging Percentage, before they finally got it right with On Base Percentage. The basic event in baseball is the Plate Appearance, and that’s what they should have used in the first place.
By limiting the definition of an "at bat" to just hits and outs, they ruined a lot of the intuitive value of Batting Average. If a guy hits .250, then you might expect him to get a hit in one out of every four times up. And you’d be wrong. It’s more like one every five times up, after you account for walks and such.
But the problem is much worse with ERA, which is one of the worst stats in sports. I’m not talking about its predictive powers here. I’m talking about the fact that it is not intuitive at all. "Earned" runs per nine innings… Who ever pitches nine innings these days? Less than 3% of the games started in 2008 were complete games. And the state is at its worst for relievers. Why would I want to know how many runs Joe Nathan would give up in nine innings? He’s never pitched nine consecutive innings in his career! The "per nine" of ERA may have made sense in the early days of baseball, but it’s an anachronism now.
[The "earned" thing is a whole ‘nother problem because it introduces the subjective judgment of one person – the official scorekeeper – to replace what we all see with our own eyes]
Pitching stats could easily have been based on a "per batter" (i.e., plate appearance) or "per inning" basis. In fact, many pitcher stats are shifting in that direction these days. People are calculating the batting average, slugging percentage, and OBP against a pitcher. Those are (roughly) per batter/plate appearance. And WHIP – Walks plus Hits over Inning Pitched is another stat that gained prominence over the last few years. These are much more intuitive than ERA. When a pitcher starts an inning, if I look at his WHIP it gives me a sense of what to expect for that inning. I look at his batting average against and (if batting average had been correctly designed) I get a sense for what the next batter is likely to do.
The lesson from baseball is that the basic "event" for a rate stat matters. By "event" I mean something that you can use for the "per" in the rate stat. The denominator. The best stats are the ones that tell you what to expect in a well defined, bite-sized bit of playing time.
Back To Hockey – Per X Minute Stats
Until about ten years ago, hockey stats never gave us the opportunity to make useful rate stats. The only "events" that were logged for players were games. And per game stats like goals per game (or games per goal) make for a pretty gross measure of someone’s talent.
But lately, we have much more opportunity. That’s because folks have started to publish Time On Ice – the number minutes played for each player in different situations. And even more recently, folks are starting to publish shift data. We suddenly have two new "events" that we can use for the denominator of our rate stats: (1) the minute of ice time, and (2) the shift.
We’re starting to see rate stats get published. But many of them are measured in "per 60 minutes" rates. For example, we’ve seen "per 60" stats used quite heavily on this site, for example in the Blueliners article two days ago. I fear that "per 60" is becoming a standard, and I’m not a fan.
In my opinion, "per 60" is all wrong. It has exactly the same problem of ERA, but worse. At least it’s possible for pitchers to throw 9 straight innings, even if it hardly ever happens anymore. But we never watch a player play 60 minutes all at once (goalies excepted).
For example, Alex Semin scored 1.76 goals per 60 last year. It’s impossible to know on an intuitive level what Semin’s 1.76 goals per 60 minutes really means. If you tell a typical ardent hockey fan that Semin scored 1.76 goals per 60 minutes of ice time, they won’t be able to tell you if that’s good or not. And that’s a shame because it was the best in the league.
I guess 60 minutes of ice time is about 4 games (or more) for a forward like Semin. So we’re saying Semin scores 1.76 goals every 4 or 5 games. But 60 minutes is only about 3 games (or less) for a defenseman. So already it’s hard intuitively to compare what 1.76 goals per 60 means between players at different positions.
This site ranks things per 20 minutes. That’s a little better, because 20 minutes of ice time is roughly what a good-not-great defenseman gets in one game. I feel like I have an intuitive sense for 20 minutes. But not really. Riley Cote doesn’t play 20 minutes in three games. And Mike Green plays 20 minutes – and then half that again – every game. So if I look at something like "ON-Goals per 20" and translate in my mind "that’s about one game," I’ll undervalue some players and overvalue others.
If we must use "per X minutes" stats, I’d recommend just shifting to "per minute." One minute of ice time is a nice, digestible bit of time. And we can all weigh it a little differently to account for the fact that defensemen play more than forwards and stars play more than grinders. Best of all, one minute is roughly equivalent to a real hockey event – one shift – so we all have a good intuitive sense of what it means. But that brings me to my preference…
My Preference – Per Shift Stats
Hockey has a basic event, equivalent to the "plate appearance" in baseball. It’s the "shift."
As hockey fans and players, we all intuitively know what a shift is.* And the advantage "per shift" stats have is that, as soon as that player comes over the boards, the stats give us a sense of what might happen during that shift. Watching the second hand cross sixty to figure out when the next "minute" will start is a little abstract. But if you watch a line change, and you know what the "per shift" stats are for the players who just hit the ice, you have some sense for what might happen while they’re on the ice. It’s an intuitive "event" to use.
The percentage is the percent of shifts in which something happened. In the first table, it’s the percent of shifts in which the line scored a point; in the second, it’s the percent of shifts in which a goal was allowed. You can think of it as the odds of that thing happening. When Green and Mo took the ice together at even strength last year, there was a 0.880% chance that the Caps would allow a goal. I think this is easy to understand, but I’m curious to know what you all think.
Here’s another advantage of shift data: we can compare it across all players (goalies excepted again). We don’t really need to convert it in our heads. A shift is a shift. Defensemen and stars have more of them per game, but shifts are about the same length for every player (or they should be, but that’s a discussion for another day). So "per shift" stats equalize for the quality of a player while he’s on the ice, as opposed to the quantity of their playing time. Riley Cote’s per shift stats are pretty much on the same scale as Mike Green’s.
No doubt you’ve see that second pesky asterisk up there. We all know what a real shift is, but to make the stats work, folks have had to redefine it to create the "statistical shift." As far as I can tell, "statistical shifts" start and end when (1) a player enters or leaves the ice; or (2) anyone enters or leaves the penalty box; or (3) the whistle blows. So if someone’s out there for six seconds and there’s an icing, and they stay out there for the faceoff, that’s now another statistical shift. If Ovechkin is out there when a power play ends, one "power play shift" ends and an "even strength shift" begins, even though he just stayed on the ice. This is why JP discussed "occurrences" instead of "shifts" in yesterday’s post about forward combinations. The shift data that is currently available is useful, but it doesn’t reflect real shifts – not the way we all normally think of them.
This may not be the traditional definition of a shift, but is necessary to allow us to compare different types of gameplay (shorthanded shifts versus even strength, for example) and to let the goal stats work correctly. I think the advantages of using shifts as the basic event outweigh any problems caused by the difference between "statistical shifts" and real shifts. So I think we should just use the data that is being compiled, call it "shifts," and just remember that the "statistical shift" is a slightly different beast than the real shift.
I realize that after preaching about the "intuitive" value of other stats, here I am pushing an artificial creation – the "statistical shift." Maybe it needs a better name, but I think this really is the best base to use for hockey rate stats. Every time a forward line or defensive paring comes over the boards, that's one battle. Much like a hitter for one plate appearance, those players have that limited opportunity to do something before they go back and sit on the bench again. I think a player's per-shift stats are the right measure of his effectiveness.