Sportlogiq has provided an interesting data set to the data community in Montreal. They use computer vision technology to analyze sports by tracking players and categorizing their movements. The output of their algorithms break down a hockey game into a sequence of events: faceoffs, passes, dekes, body checks, shots on nets, goals, etc. The data that they provided represents the equal-strength (non-power play) events of the 6 games of the Ottawa vs. Montreal playoff series in 2015. Please contact them directly for more information about the data set.
One thing that is particularly interesting about the data is that it tracks the passing in the game. There are many statistics collected about hockey that are used to analyze games, teams, and players. But passing statistics beyond assists are typically not amongst them. Tracking passing is labour-intensive for humans because it happens so frequently within each game. However, Sportlogiq’s computer vision technology makes pass tracking a computer automated task. So, this technology can potentially open up passing as a dimension of hockey statistics. I am not aware of any other hockey statistics organizations that can provide detailed passing data (please let me know).
In this article, I would like to do some exploratory analysis of the passing data from this series of games to see what insights we can gain. The data tracks both successful and failed pass attempts. Therefore, we can hope to gain some insights about what ingredients make for a successful pass in the NHL. This information may be of interest to players, coaches, and fans.
The data represents the equal-strength (non-power play) events of the 6 games of the Ottawa vs. Montreal playoff series in 2015. Each game is broken down into a sequence of events, including pass attempts. For each event in the game, the following features are supplied by Sportlogiq:
- Event ID
- Period (1st, 2nd, 3rd, or 4th for overtime)
- Enumerated possession numbers
- Which team possesses the puck
- Enumerated play within the current possession
- Whether the event breaks the current posession
- Which frame of video the event happens at (can be used for timing information)
- What type of event it was (about 100 event types are tracked representing passing, faceoffs, carrys, dekes, shots on net, body checks, passes and others). Passing events are also broken down into types (see below).
- Which zone the event occurred in (defensive zone, neutral zone, offensive zone)
- Whether the event was successful or failed
- The coordinates of the player when the event occurred
- The player responsible for the event, including first name, last name, jersey number, team, and position
The following plot shows the passes broken down by type. Descriptions of each type are:
- d2d – Pass from one defenseman to another
- eastwest – Cross-ice pass to a teammate in the offensive zone
- north – Pass to a teammate around the boards in the offensive zone
- outlet – Forward pass in the defensive zone
- rush – Pass immediately following controlled entry into the offensive zone
- slot – Pass to a teammate positioned in the slot
- south – pass in the offensive zone from one side of the ice to the other
- stretch – Pass from the defensive-zone to a teammate positioned on the other side of centre ice
The 8 rink images show the origin locations for successful (green) and failed (red) passes for each pass type. In these plots, one can see clusters of red or green points indicating regions where the pass is more or less likely to be successful. For example, passing into the slot from behind the net is not likely to be successful, but passing into the slot from the wings has a relatively good chance of success.
The mosaic plot at the bottom shows the frequency of each pass type as the width along with the proportion of successful and failed passes of that type. Each type of pass shows a different success rate. This plot tells a story of risk and reward in transferring the puck to a teammate. “D2D” passes where one defenseman passes the puck to another are very low risk and are often unchallenged by the defending team. However, these passes are also relatively low reward because the defensemen are not typically in good scoring position, and often these passes happen in the defensive zone. In contrast, passes into the slot (the area immediately in front of the defending team’s net) are very high risk and high reward. A successful pass into the slot will place the puck in excellent scoring position. Therefore, these types of passes are fiercely defended against and are often not successful.
In addition to the above, we can compute some additional features of the passing events that can be useful for understanding passing in hockey.
- We can approximate the destination position of the pass as the coordinates of the next event in the sequence. In general, the true destination position of the pass isn’t recorded, so the location of the next event in the game is a heuristic, but in most cases the next event will be near where the puck ended up as a result of the pass. It is the best that we can do with the given data.
- From the above approximation, we can approximately compute:
- Distance of the pass
- Speed of the pass using the frame information to determine the time between the past and the next event
- Distance to the net of the pass destination position
- We can approximate the amount of time the passer controlled the puck by looking at the amount of time between the previous event and the pass.
- We can record which game the pass occurred in
- The distance to the net from the pass’s origin position
It is important to note that events occurring during power plays have been removed from the data set in a way that is not transparent. Therefore, I can not always tell whether two neighbouring events are separated by a series of power play events or whether they are true neighbours. In some cases, the timing information provided by the video frame numbers is used to identify this problem, but in other cases it goes undetected.
Insights on Successful Passing
Who are the best passers?
We expect that there may be some all star passers who always complete a pass and some mediocre passers with a less impressive success rate. The following plot shows the passing success rate by player (successful passes / pass attempts). The error bars are one standard deviation assume that passing is a Poisson process, which turns out not to be quite true. So, a more thoughtful analysis may be needed for some purposes, but this approximation suffices for our exploratory analysis. The green region shows the series success rate for all players. There is a strong correlation with successful passers on the right being defensemen and goalies and less successful passers on the left being forwards. This is explained by the different pass types that players in each position will attempt, as described above. From this plot, we can see that we don’t have enough statistics from a single series to identify the best passers, particularly taking into account their positions.
Similarly, there is no significant difference in passing success between the two teams.
Predicting successful passes
From the data, we can learn the ingredients of successful passes and how different features contribute. For our purposes, we will create a predictive classification model using a random forest with 1000 trees and 4 variables randomly sampled at each split. For our purposes, we will only use information that would be available to the passer at the time of making the pass. For example, our measures of pass distance and pass speed use information from future events, but are within the control of the player making the pass. The information about whether the pass attempt is possession breaking is information from the future that the player does not have at the time of deciding to make the pass.
The random forest model can show us the relative importance of the various features of our data set in terms of how strong they are at predicting the outcome. The following plot shows that pass type (whether the pass was defenseman to defenseman or into the slot) is the most important feature of the original data set. The variable d2netNext represents the distance to the net of the destination of the pass. This is the most important of the engineered features.
The variable descriptions are as follows (in many cases approximations and heuristics are used where the data is not available):
- type – The type of pass as discussed in “the data” section
- d2netNext – The distance between the destination of the pass and the nearest net
- passDist – The distance between the origin and destination of the pass
- netdist – The distance between the origin of the pass and the nearest net
- xPos, yPos – The coordinates of the origin of the pass
- passSpeed – The speed of the pass (distance divided by time)
- zone – Whether the pass was in the defensive zone, neutral zone or offensive zone
- playDuration – How long the player controlled the puck prior to passing
Other variables mentioned above but not appearing in the plot were found to have low predictive value and were not incorporated into the predictive model.
It is useful to examine some of these variables in more detail. Consider the ‘best’ engineered variable, the distance between the pass destination and the net:
The above plot shows that passing the puck to a position close to the net is more difficult than passing to a position far away from the net. This we can understand in terms of the risk and reward of scoring position and intensity of defence.
The following plot represents pass distance:
This plot seems to show a ‘sweet spot’ for passing distance. Passing between 10 and 40 percent of the rink length seems to produce a better chance of success. Shorter and longer passes may represent more desperate situations where the passer has less control. A long hail-mary pass may have less chance of success, and a very short pass may represent a player who is swarmed by the defence and has few options for moving the puck.
The following plot represents the pass speed:
This plot may show a slight advantage to faster passing, but the effect is too small to be significant. The spikes at very slow passes are likely due to power plays which have been removed from the data and can create an apparently long period separating a pass from the next event. This would be interpreted as a very slow pass for the purposes of this plot.
Overall, the predictive model can correctly classify 77.4% of the passes as successful or failed (classification accuracy). This is a somewhat disappointing result considering that predicting that every pass will be successful gets one to 71%. One important problem is that this analysis has been blind to the locations of other players besides the passer. This information is certainly a primary consideration to a player who is deciding whether to pass. To do better, we should incorporate information about the positions of the defenders who are likely the causes of most of the failed passes. It is likely that Sportlogiq’s technology can track the defensemen’s positions, but that data is not included in the data set that was made available.
For players deciding whether to attempt a pass or for coaches training players how to pass successfully, knowing the most likely outcome of the pass is not as useful as being able to weigh the risks against the rewards in each particular situation. In the figure below, the lower left corner represents a player who assumes that all passes will fail (indicating that the player will never pass the puck). The top right corner represents a player that assumes every pass attempt will succeed (indicating that the player will take every possible opportunity to pass to a teammate who is in better scoring position). The curve connecting these points shows different levels of risk that a player could adopt within the predictive model and what fraction of successful and failed passes they will attempt at that level of risk.
This has been a preliminary analysis of a relatively small data set representing only 6 games between the same two teams (Montreal and Ottawa). Only even strength play time is included. It is unclear how well the results will generalize to other games or to NHL hockey as a whole. However, the main insights that we can get from this analysis are:
- Passing ability does not vary enough from player-to-player or team-to-team to see an effect in this data. The passing situation is much more important than the individuals involved. With data from more games, it may be possible to determine who are the most skilled passers and receivers on a team or in the league.
- Passing the puck to a teammate who is near the net or in the slot is very difficult, but passing between defensemen in the defensive zone is relatively easy. This makes sense because the area in front of the net is where a player has the best scoring opportunity, so it is heavily defended.
- Passing into the slot from the wings appears to have a better success rate than passing into the slot from behind the net.
- There appears to be a sweet spot for passing distance between 10% and 40% of the length of the rink. Longer and shorter passes are more likely to fail.
- Contrary to popular wisdom that says passes should be hard, we do not see a strong benefit to high speed passing in our data.
The most important factor toward understanding passing statistics is to include the important context provided by the positions of all players on the ice at the time the pass is made. My analysis has been totally blind to this. The present data set provides glimpses of where individual players are when they are the focus of one of the tracked events. This could be used in principle to model and estimate where players are between the figurative radar blips. It would not be straightforward to model speeds and trajectories from these blips given the fast pace of hockey. However, it may be possible to do this with the current data set. Ultimately, it would be helpful to know the positions of each player for each frame of video. This would provide useful information for understanding the eligible pass receivers on the offensive team and the potential pass interferers on the defensive team. I am not sure if the current machine vision technology is capable of providing this information, but I think it is likely to exist in the future.
Currently, we are limited to data from just six games played within a short time frame between two teams. There are a number of things that may emerge from having more data from more games and more teams. We could determine who are the best passers and how much variation there is in passing and pass receiving skill within the NHL. We could compare passing rates and passing success rates between different teams and see how this leads to better plays and more goals. We could see how teams and individual players improve at passing with time and experience. We could also see how well our conclusions listed above generalize to other games and other teams.
I welcome feedback. Please let me know if I have made any mistakes or missed any thing.