PITTSBURGH, PA - JANUARY 22: Alex Ovechkin #8 of the Washington Capitals celebrates his third period goal against the Pittsburgh Penguins during the game at Consol Energy Center on January 22, 2012 in Pittsburgh, Pennsylvania. The Penguins defeated the Capitals 4-3 in overtime. (Photo by Justin K. Aller/Getty Images)
Today we conclude our three part series on tracking scoring chances. In Part I we covered our first impressions of the process, and what we thought we got out of it. In Part II we dove into some specific examples to look at how a variety of factors led to our ultimate decisions. Today we'll give you our closing thoughts on the value of scoring chance data, what it means, and why it may be more valuable to NHL teams than to fans tracking the game manually. Without further ado...
So what are the finals thoughts, did it change how you view the data or how useful you think it is?
RP: I don’t think it really changed my beliefs on the utility of scoring chance data, I already thought it was useful data. I did used to think that if more people tracked the data, it would be more useful. Now I'm not so sure. I have notes of potentially questionable chances that I personally didn’t consider a scoring chance and I thought that having more people judge those plays will create a more robust picture of what happened. This process highlighted how subjective these things are on an individual level, but I also had a lot of experience with that when I did the soft goals analyses so it wasn’t surprising. The degree to which the subjectivity infiltrated our analysis, and the fact that there were pairs or groups of us were clearly on the same page did make me think that it's so subjective that large numbers don't necessarily cure the subjectivity.
JP: Agreed. I wouldn’t say that it changed my thoughts on the utility of scoring chance data, generally, but it hammered home that scoring chances are, ultimately, subjective - as I fret about whether a guy’s release point was inside or outside of the "home plate" area, what I decide impacts the data itself. And that, in turn, highlights my bigger takeaway (which is confirmation of what I’ve always thought about scoring chance data), namely that counting up chances is all well and good, but not all chances are created equal and treating them as such doesn’t necessarily faithfully represent what’s happening (a tap-in from the side of the net isn’t the same as a backhand flip from the dot with a defender in the way). Chances should be graded, which would be all well and good for an individual team’s analysis (which I’m sure is exactly what they do), but would be hell in trying to compare players across teams. Point being, great chances are going to be "scoring chances" on everyone’s sheet, but how many not-so-great chances get included can make big differences.
GOULD: It didn’t make me question the value of scoring chance data in general (I think it could be very valuable.) What it did was make me want to figure out who did the scoring and whether they were experienced and whether their data is generally accepted. Because I was terrible at it, and I wouldn’t want anyone relying on my data. One thing became very clear right at the beginning: with my personality type, I can either enjoy a game or score it, but I can’t do both. This felt like work. I scored the game two months after it happened, and there was no suspense about the outcome -- I had to click right on the final score to load it up on GameCenter live (something I’d like them to change so that you can, you know, watch a game after the fact without having it ruined before you start). So I wasn’t watching play or systems or paying attention to the flow of the game or anything fun like that. I was just focused on whether there was a chance or not. I found my focus wandering at times, so I didn’t do a great job, though I did rewind and rewatch lots of plays and I generally took it seriously and tried to do a good job. This is probably not something I’m going to do again. And the folks who have been doing it all the time, like Neil, have even more of my respect than they had before. In the immortal words of Dr. Klahn, you have contributed a data set of enormous magnitude, and you have our gratitude.
SME: I don’t think there’s necessarily much utility in the raw numbers for scoring chance data. However, I do think there is very good utility in the ratios recorded by the team. As already touched upon, they are very subjective, which can change the numbers quite a bit. And as JP noted, the way they are recorded is far too binary for scoring chances; it assumes that all chances are created equal. All chances are equal, but some chances are more equal than others. It’s difficult to record all of the chances in a way that makes it easy to compare in such a broad manner across teams and players, especially when taking in the subjectivity of it, and all of the moving parts involved in the game.
renstar: This project changed how I viewed the data in the sense that, beforehand, I had very little appreciation for what the scoring chance data gave us. I found that actually tracking the stat gave me a better understanding of what the data describes and found that the action of tracking the chances gave me a better feel for how the control of the game ebbs and flows (maybe just because I was watching more closely). But this project as a whole made me more wary of the individually tracked data sets. I mean that as no insult to those who did the tracking: it was definitely yeoman's work, and we learned so much from the two-season effort. Rather, there are too many places for human decision to come in to play, and too many seemingly valid definitions of a scoring chance to convince me that there is one that captures something that correlates cleanly to possession. (More on this in a moment.) In a way, I'm actually glad that it was eventually found that scoring chances track with Corswick, as the Corswick stats can be adjusted for home/away/arena biases and are based on globally available data.
Briefly going back to the idea of possession...before we found out that the scoring chanes project ended, I went back over the first period of the game that we all tracked (so this is the king of small sample sizes) and looked at two easily defined, unambiguous measures of possession: the zone that the puck was in and the team who had control of the puck. The next image shows possession in the first context. Red is when the puck is in the PIT zone and WAS 'has' possession, white is the neutral zone, gold is PIT possession (puck in WAS zone), and black lines are stoppages. The scoring chances determined by this project are plotted for each tracker.
Similarly, I looked at actual possession, as in which team had the puck on their stick. Again, red is WAS possession and gold is PIT possession with the scoring changes plotted over it.
As I said, these are small sample sizes so I'm not drawing any strong conclusions, but it is not completely clear to me how scoring chances correlate with or drive possession.
D’oh: This exercise changed my thoughts on the utility of tracking chances, and I’m sad to say it was for the worse. I don’t think it’s an entirely fruitless endeavor, but I’ll echo what JP said above - it’s far too subjective and not all chances are created equal. I found myself marking a lot of chances as "marginal" to reflect that, while I thought it was an above-average chance, it wasn’t almost a goal. Moreover, a more liberal interpretation of a scoring chance would seem to heavily favor a puck-possession team or a team that takes many shots, whereas a more restrictive interpretation might favor a team that worked to create excellent chances. This means that comparisons between teams (and between players on different teams) are largely useless unless the same person is tracking both teams - even then the validity of the data would depend on the styles of the teams being tracked and the tendencies (liberal or restrictive) of the person doing the tracking. I’d wager that teams have their own internal tracking system, with grading scales and multiple people to track each game.
GOULD: There are ways to deal with the subjectivity. First I think is experience. I bet folks get more consistent with one another at scoring games just by scoring lots of them, even if they don’t compare data or anything. Second would be some kind of cross-check that doesn’t currently exist. I’m actually a published author on a couple of statistics-based legal articles: we coded hundreds of trade secrets cases for certain objective factors and then presented an analysis (email me if you want to know more). One of the things we did was cross-check a few of each other’s cases and talk through any differences of opinion on how the case should have been coded. It made us more consistent with one another. If scoring chance trackers would organize and coordinate, this would actually be pretty easy to implement. It would make the data better, but I don’t think it’s absolutely necessary. For now, we should treat the data like "hits" and even "shots" -- data where there’s a certain range of subjectivity so you have to take it with a grain of salt, but ultimately the world is better for having the data available and you just have to do your best to make sense of it in a fair and accurate way.
JP: My second big takeaway here flows from the first one, and it’s another point that you touched on, Rob - given the subjectivity involved, the more people tracking this stuff, the better the data will be. Even just having a second or third scorer will make the resulting numbers more useful. As fans, it’s not a futile exercise - it’s important crowd-sourced information that can help provide a better understanding of what we’re all seeing.
SME: The ratios and numbers involved strike me as lending themselves to being overemphasized in the same manner that faceoffs are, though scoring chances are much more important. But when the ratios have a player getting 45% of the chances when he’s on the ice, that seems bad, until one looks at the fact that with a couple more chances, they’d be at 50%. It strikes me at being useful at comparing players on the same team, though not necessarily across teams. Like so many of the stats that already exist.
D’oh: One thing that struck me is that the offensive data for defensemen is probably skewed by the fact that a lot of good chances are off the rush, and the defenseman (men) who started the play with outlet passes have left the ice on a change by the time the chance is registered. I counted a few instances of this phenomenon during this game and, as SME points out above, that can shift a player’s chance ratio from positive to negative pretty quickly. I’ll also point out that I found one pretty large error in the RTSS data that timeonice.com uses to track who’s on the ice for chances - given the small difference between a player being a "plus" and being a "minus" I think this makes me more wary of using this data for analysis of individual games. Over the course of a season I imagine those errors wash out. Another take-away is that more and better camera angles would really help. There was a chance at 5:31 of the 2nd period where Steve Sullivan and Richard Park managed to create a little 2-on-1 in the slot against John Carlson. From the original angle (shown below), I logged it as a chance, albeit a marginal one.
Later, the broadcast showed a view from behind the net. From this second angle (shown below), it appears as though Carlson has the passing lane almost totally cut off and Sullivan has his head down. It would take a great pass to get to Park, and because Sullivan would have to put the pass somewhat behind Park (in order to clear Carlson), Neuvirth might have enough time to get over to make the save. I still think it’s a scoring chance, but the second angle really gives much more information.
Rob: More camera angles would definitely help, I just don't think this is the best example to make your case. A two-on-one in the slot is going to be a scoring chance. Period. You can applaud Carlson for playing the pass well (as is his job) but the fact remains that the Penguins should have had a high quality shot on net; the Caps were saved by an extremely poor decision from Steve Sullivan more than anything else. But I digress.
There's a bit of a paradox in scoring chance evaluation. On the one hand, you want more eyes to build a more complete picture and to wash out individual biases, but on the other hand you can only truly get a consistent definition to apply if you use a single person to define a scoring chance. I think this is where the main difference between fan tracking and team tracking lies. Teams only need the definition they are comfortable with, and their scouts can calculate the data as they are comfortable; the same set of eyes is creating the entire data set. With fan tracking, you have a different set of eyes tracking each team, and they are simply not going to be able to apply the same definition. If they do manage to agree upon a mechanical definition of "scoring chance," they'll lose some specificity and undercut the value of their own data. As Dan Bylsma noted: "There is more to it than just location," said Bylsma. "Way more. It's not just odd man rushes. We have a certain area [on the ice] we consider to be a scoring chance. There are circumstances that if it's outside that area it could still be a scoring chance, like a wrap-around. Some wrap-arounds are chances, some are not."
In order to apply that highly specific, context-dependent definition of a scoring chance, you need a dictator to make the final ruling, even a tightly knit coaching staff is going to disagree, and several eyes can muddy the picture just as easily as they can clarify it. In an NHL organization, one person has the final word, but that's not the case in fan tracking. For an NHL team, it's the head coach that has the final word. Bylsma continued: "There are some times as a coaching staff, maybe twice a game, we have to get together and decide if there was a scoring chance or not. The reason it is or isn't is not because we all agree, it's because I say it, because I'm the last guy to say it is or not." There's no way to make those close calls in fan tracking, so it ends up aggregating individual biases without identifying or controlling for those biases.