Like every sport, Ice Hockey is no novice to the statistical analysis of the game’s rules, players, teams, and rinks in order to collect data that will provide an interesting story about the game: either fun statistics that link certain events, or the prediction of winning odds for a team in games and competitions. Due to so many aspects of Ice Hockey, it is a sport known for high amounts of skill and luck needed to succeed. The puck is of special importance in so much as the luck factor, as “where the puck lands” is a common saying about the unpredictable results of puck placement.
In this study, multiple aspects of the sport’s skill and chance will be analyzed in order to answer questions often asked in the Ice Hockey Community, in betting circles, and within the inside organizations that make up the US and Canada’s National Hockey League (NHL).
Understanding the relative chance of successful shots on goal position and shot attempt positions helps coaches and players understand the high danger areas the puck must stay out of on their side of the rink, and where the forwards must plan strategy around on the opposing team’s side. Additionally, it will reflect the state of the game as a whole, as any changes from the past or in the future of the density of shots in any area could be a sign of a change in sport tactics and meta-game.
Corsi and Fenwick are two very similar advanced stats used by hockey analysts to compare the indiviual and team skill in the game. The logic behind their importance is simple: more shots means more chances to score a goal, and thus more opportunities to win. The difference is that the Fenwick stat excludes blocked shots from the dataset. Is this exclusion drastic enough to make a difference in reliability between Corsi and Fenwick in predicting goals and wins?
Offensive Pressure is a concept of the consistency a team’s Forwards can push and pressure the opponent’s defense until it breaks, and leads to shots and goals. In a 2014 study forecasting success by NHL hockey stats by Joshua Weissbok, it was concluded that increased Forward activity of a team overall has a higher predictability of success (0.605) across the season than defensive game (0.395). But understanding what basic and advanced stats of the forwards relate to more offensive pressure will determine what actions lead to better offensive pressure, and a higher probability of success throughout the season.
Using the data collected from the previous explorations, a basic prediction of who could win more games and perhaps win the Stanley Cup can be made as an application of the conclusions of the previous analyses as a test to determine their validity as the current season comes to an end in the Spring, and its results be compared to the predictions.
This dataset has each shot event since 2009, detailing the characteristics of the shot itself and those involved.
xCord: The coordinate of the shot across the x-axis of the rink.
yCord: The coordinate of the shot across the y-axis of the rink.
shotWasOnGoal: a binary where 1 means the shot was a Shot On Goal, or SOG, a shot that if not blocked or grabbed, would have hit the goal net.
homeTeamCode | awayTeamCode | season | isPlayoffGame | game_id | homeTeamWon | id | time | event | xCord | yCord | shotWasOnGoal |
---|---|---|---|---|---|---|---|---|---|---|---|
NSH | SJS | 2022 | 0 | 20001 | 1 | 8 | 23 | SHOT | 44 | 8 | 1 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 11 | 36 | MISS | 44 | 27 | 0 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 15 | 59 | SHOT | -33 | 8 | 1 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 16 | 61 | GOAL | -74 | -5 | 1 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 18 | 72 | SHOT | -81 | 15 | 1 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 20 | 97 | MISS | -68 | 9 | 0 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 26 | 162 | SHOT | 72 | 2 | 1 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 30 | 209 | MISS | 51 | 12 | 0 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 34 | 249 | SHOT | -40 | 22 | 1 |
NSH | SJS | 2022 | 0 | 20001 | 1 | 37 | 259 | SHOT | 49 | 12 | 1 |
This dataset has each skater’s (excluding goalies) stats by season, and by situation, based on the NHL’s public API and organized by MoneyPuck.com. The data used here was of the 2022 season and combined MoneyPuck.com’s public dataset and the NHL’s 2022-2023 season skaters’ points per game (PPG), goals per game (GPG) and points per hour (P/60) and goals per hour (G/60) by the skater’s time on the ice.
onIce_fenwickPercentage: The FF% of a player while on the ice.
offIce_fenwickPercentage: The FF% of a player while off the ice.
onIce_corsiPercentage: The CF% of a player while on the ice.
offIce_corsiPercentage: The CF% of a player while off the ice.
I_F_shotsOnGoal: The individual credit to a player for a shot on goal (SOG).
I_F_shotAttempts:The indiivdual credit to a player for a shot attempt.
I_F_takeaways: The amount of takeaways a player has had.
I_F_hits: The amount of hits a player has had, including those given penalties for.
I_F_faceOffsWon: The amount of faceoffs won on either side of the rink by the player.
P/60: the average points made by a player every 60 minutes of on-ice play time.
G/60: the average goals scored by a player every 60 minutes of on-ice play time.
name | playerId | season | team | position | situation | games_played | icetime | shifts | gameScore | onIce_corsiPercentage | offIce_corsiPercentage | onIce_fenwickPercentage | offIce_fenwickPercentage | I_F_shotsOnGoal | I_F_shotAttempts | I_F_faceOffsWon | I_F_hits | I_F_takeaways | G/60 | A/60 | A1/60 | A2/60 | P/60 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A.J. Greer | 8478421 | 2022 | BOS | L | 5on5 | 61 | 32588 | 806 | 12.16 | 0.47 | 0.51 | 0.49 | 0.51 | 65 | 106 | 2 | 101 | 16 | 0.44 | 0.77 | 0.44 | 0.33 | 1.22 |
Aaron Ekblad | 8477932 | 2022 | FLA | D | 5on5 | 71 | 69744 | 1496 | 53.95 | 0.55 | 0.53 | 0.53 | 0.52 | 132 | 235 | 0 | 53 | 24 | 0.21 | 0.57 | 0.42 | 0.16 | 0.78 |
Aatu Raty | 8482691 | 2022 | VAN | C | 5on5 | 15 | 7066 | 177 | 3.37 | 0.50 | 0.51 | 0.52 | 0.53 | 15 | 25 | 42 | 19 | 4 | 1.02 | 0.51 | 0.51 | 0.00 | 1.53 |
Adam Beckman | 8481550 | 2022 | MIN | L | 5on5 | 9 | 5299 | 115 | 1.69 | 0.54 | 0.48 | 0.52 | 0.49 | 12 | 25 | 0 | 3 | 2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Adam Boqvist | 8480871 | 2022 | CBJ | D | 5on5 | 46 | 41194 | 899 | 19.90 | 0.48 | 0.44 | 0.48 | 0.43 | 44 | 87 | 0 | 21 | 10 | 0.44 | 0.71 | 0.35 | 0.35 | 1.15 |
Adam Erne | 8477454 | 2022 | DET | L | 5on5 | 61 | 42003 | 908 | 6.46 | 0.42 | 0.48 | 0.42 | 0.48 | 48 | 107 | 10 | 155 | 11 | 0.52 | 0.69 | 0.34 | 0.34 | 1.20 |
Adam Fox | 8479323 | 2022 | NYR | D | 5on5 | 82 | 88198 | 1722 | 78.53 | 0.54 | 0.47 | 0.54 | 0.48 | 101 | 207 | 0 | 19 | 65 | 0.34 | 1.09 | 0.50 | 0.59 | 1.42 |
Adam Ginning | 8480874 | 2022 | PHI | D | 5on5 | 1 | 934 | 18 | 0.40 | 0.63 | 0.52 | 0.63 | 0.54 | 0 | 1 | 0 | 1 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Adam Henrique | 8474641 | 2022 | ANA | C | 5on5 | 62 | 47265 | 1069 | 32.10 | 0.46 | 0.42 | 0.45 | 0.41 | 94 | 150 | 131 | 26 | 14 | 1.16 | 0.78 | 0.31 | 0.47 | 1.94 |
Adam Larsson | 8476457 | 2022 | SEA | D | 5on5 | 82 | 97656 | 1890 | 55.48 | 0.55 | 0.51 | 0.56 | 0.52 | 133 | 282 | 0 | 196 | 29 | 0.22 | 0.82 | 0.45 | 0.37 | 1.05 |
This dataset is each team’s games played throughout the 2008-2022/23 seasons. This is another dataset from MoneyPuck.com, and only includes data from 5v5 play.
home_or_away: Whether the team is the home or away team for their game.
corsiPercentage: The team’s CF% for the game.
fenwickPercentage: The team’s FF% for the game.
final: Whether the team won or lost their game.
shotAttemptsDiff: The difference in shot attempts for and against the team.
shotsOnGoalDiff | corsiPercentage | fenwickPercentage | shotsOnGoalFor | shotAttemptsFor | goalsFor | goalsAgainst |
---|---|---|---|---|---|---|
17 | 0.6429 | 0.6364 | 28 | 45 | 1 | 1 |
13 | 0.6145 | 0.6167 | 26 | 51 | 1 | 1 |
-4 | 0.4706 | 0.4412 | 22 | 40 | 4 | 2 |
-2 | 0.4615 | 0.4286 | 18 | 30 | 3 | 2 |
-2 | 0.4583 | 0.4510 | 17 | 33 | 1 | 1 |
3 | 0.6226 | 0.6471 | 12 | 33 | 1 | 0 |
6 | 0.6308 | 0.6327 | 18 | 41 | 0 | 0 |
-11 | 0.3883 | 0.3793 | 27 | 40 | 4 | 3 |
6 | 0.5952 | 0.5625 | 24 | 50 | 0 | 1 |
2 | 0.5441 | 0.5294 | 19 | 37 | 2 | 1 |
In the NHL, the standard analysis of team performance is comparing their goals for (GF) and their goals against (GA). More goals scored in a season than allowed generally means a better team compared to a team with fewer scored goals and more goals allowed.
This comparison of teams can also aid watchers decide the “kind of team”, as in what a team is like to watch. Low scored, low allowed goal teams are considered “boring” as not much happens in their games. “Bad” teams will have low scored goals and high allowed goals. “Good” teams will have high scored goals and low allowed goals, and “fun” teams will have both stats high. While not having any meaningful use in determining game-to-game performance, nor outcome of a game, it can be a general indicator of the season performance of a team.
When determining the performance of individual players, however, advanced stats like Corsi (CF%) or Fenwick (FF%) are used instead of goals, assists, or saves. CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent’s defense.
Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team’s stat when the player is performing, and off-ice CF%/FF% is the team’s stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.
While its expected for the highest density of shots that are a goal to be in front of the goal itself, meaning the shortest time to react by the defense and easiest accuracy on the shot, the rest make a W shape with wing shots more common than center shots, showing a tendency to rely on the wing for making shots. Additionally the dense zone in front of the goal extends to the edges and even partly behind the goal, displaying the full range of coverage a goalie must be able to defend from.
This image could be due to a number of things, but given the data ranges from 2008-2023, it is most likely due to the positioning of the 3 forwards themselves. basic tactics include keeping the wings of the rink filled with a player for passing or scoring opportunities, and this reveals itself in the amount of SOGs in the wings. It also means that the wings are a vital part to defend compared to the center, which can be covered most by the goalie and passing players between the wings.
CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent’s defense.
Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team’s stat when the player is performing, and off-ice CF%/FF% is the team’s stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.
Both stats are known as Advanced stats, used to compare the relative strength of a team’s offense and defense together.
When comparing the CF% and FF% player statistic together, both are understandably highly correlated with each other, given that they are the same calculation, with Fenwick only excluding a type of shot.
When the team’s CF%?FF% is compared to the final outcome of a game, however, CF% and FF% show a small, but important difference. CF% is just slightly inversely proportional with the win odds, with the average CF% of a team being higher when they lose than in a win. With the FF%, the average is even no matter if the game is a win or lose, and the 1st and 2nd quartiles are higher for winning games than losing games.
This means, for the purpose of concluding if a team can win a game against another team, FF% is a slightly better stat to compare than CF%, though the distinct value difference between the teams is no good indicator of a win. This can be due to a number of reasons, like a difference in accuracy and precision of a team’s forwards, the difference in defensive game of either team, or even simple puck luck.
Offensive Pressure is no single statistic, but a value determined by the overall performance of a shift in a game. Unlike goals scored or shots attempted, the offensive pressure of a shift can be good, despite a lack of goals or shots to show for it. But, offensive pressure can prevent the possession of the puck by the opponent, additional stress that may lead to more mistakes by the opponent, and more.
So what statistic is used for offensive pressure? In hockey, the points per play hour, or P/60 statistic is considered to best represent the pressure a player puts in the rink when on the ice. This is because more activity, more possession, and more opportunities means more possible points made.
Most of these statistics were expected to increase P/60 due to the nature of more opportunities of a player to do both the stat and gain more P/60. However, the difference in shot attempts between the home and away teams were actually inversely proportional to the game’s outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.
These results make sense, both because opportunity time for one stat means opportunity time for another, and that more hits a player makes means more possible penalties from those hits, and thus less play time, and no pressure being added on the ice. The shot attempts differential result, however, is against basic thought that shooting more means more chances for goals, and thus more points. the slight inverse proportionality may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.
Fenwick certainly correlates with P/60, as a higher percentage means more shot attempts in general, but the amount of variation is severe, and the correlation does not have a major effect on P/60. This tracks given Fenwick’s record with winning odds and predictive abilities. Shot Attempts, no matter the type, does mean giving the puck away on a shot, and the variation of goals and misses means the points a player can accrue is limited within the statistic.
Corsi is in essentially the same boat as Fenwick in correlating with P/60, with no major differences. In fact, it has the same issue as Fenwick does in correlating with P/60 and, as such, offensive pressure.
SOG has a good correlation with P/60, as shots on goal are shots more likely to go in the net given that only the defense is in the way of it being a goal, and thus SOG tend to be goals more often.
Shot Attempts, like SOG, have a similar correlation with P/60. This, like the Corsi/Fenwick relation, is for the same reasons as SOG.
Takeaways have a positive but slowly degrading correlation with P/60, with a significant amount of takeaways leading to diminishing returns in P/60. In terms of offensive pressure, no stat could be reasoned to be more offensive than taking possession of the puck away from the opposing team. More takeaways means more pressure, and the graph reflects that.
Hits have a surprising result, given the importance hitting has in the sport culturally. Fewer hits correlate with more P/60, with more hits leading to a slightly lower P/60. Game-wise, more hits lead to more penalties, and that cuts opportunities for points to be made, and removes the player from adding offensive pressure from the ice.
Face offs won have a slight correlation with P/60, but cannot compare to other stats given the huge range of data on the left of the graph where the defense positions play. So despite the slight correlation with P/60 and offensive pressure, as won face offs means more opportunities for puck possession, the relation is too small to have a real effect in the game, and the season as a whole.
Understanding the common locations of puck shots allows teams to strategize their defense around defending against those shots, especially for the goalie watching for where and when the puck may be shot.
For the offense of a team, the location density can be compared with their own data, leading to different strategies of where to force a puck down the line, and how to set up a shot against their opponents.
Fenwick is a better stat for predicting success by both players and teams in competing during a season over Corsi. The average rate of wins correlate more with Fenwick than with Corsi.
The difference in shot attempts between the home and away teams were actually inversely proportional to the game’s outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.
The shot attempts differential however, has a slight inverse proportionality that may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.
Other stats, like Takeaways, SOG, Fenwick/Corsi, and Shot Attempts had a positive correlation with a player’s P/60, and therefore offensive pressure.
The data collected was across the 2022-2023 season for individual and team stats. The available graphical analysis may not have exposed additional correlations between stats, or other advanced stats may have an unexpected effect on the outcome of a game, or season.
Assumptions like the situation always being 5v5 were important in understanding the effect of certain stats in equal play, but uneven situations may lead to some stats having more or less of an impact on a game.
For the Future, larger studies on all situations and available seasonal data could lead to a more comprehensive understanding of the questions asked.
---
title: "What Makes an NHL Team?"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: default
navbar-bg: "#4292c6"
orientation: columns
vertical_layout: fill
source_code: embed
---
<style>
.chart-title { /* chart_title */
font-size: 22px;
}
body{ /* Normal */
font-size: 18px;
}
</style>
<head>
<base target="_blank">
</head>
```{r setup, include=FALSE}
library(tidyverse)
library(pacman)
library(plotly)
library(dplyr)
library(knitr)
events <- read_csv("shots_2007-2022.csv")
sktr22 <- read_csv("skaters22-23.csv")
teams <- read_csv("all_teams2008-2023.csv")
teams22 <- read_csv("teams22.csv")
#https://rstudio-pubs-static.s3.amazonaws.com/257443_6639015f2f144de7af35ce4615902dfd.html
#and wikipedia
arenas <- read_csv("arenas.csv")
nhl22_1 <- read_csv("sktr22pgp1.csv")
nhl22_2 <- read_csv("sktr22pgp2.csv")
nhl22_3 <- read_csv("sktr22pgp3.csv")
nhl22_4 <- read_csv("sktr22pgp4.csv")
nhl22_5 <- read_csv("sktr22pgp5.csv")
nhl22_6 <- read_csv("sktr22pgp6.csv")
nhl22_7 <- read_csv("sktr22pgp7.csv")
nhl22_8 <- read_csv("sktr22pgp8.csv")
nhl22_9 <- read_csv("sktr22pgp9.csv")
nhl22_10 <- read_csv("sktr22pgp10.csv")
nhl22 <- rbind(nhl22_1, nhl22_2, nhl22_3, nhl22_4, nhl22_5, nhl22_6, nhl22_7, nhl22_8, nhl22_9, nhl22_10)
```
```{r combos}
sktr22 <- subset(sktr22, sktr22$situation == "5on5")
nhl22 <- rename(nhl22, name = Player)
nhl22gp <- select(nhl22, c("name", `G/60`:`P/60`))
sktr22_gp <- merge(sktr22, nhl22gp, by = "name")
teams22 <- subset(teams22, teams22$situation == "5on5")
teams22 <- mutate(teams22, shotAttemptsDiff = shotAttemptsFor - shotAttemptsAgainst) %>% mutate(teams22, shotsOnGoalDiff = shotsOnGoalFor - shotsOnGoalAgainst)
tmgm <- subset(teams, teams$situation == "5on5")
tmgm <- mutate(tmgm, final = case_when(
goalsFor > goalsAgainst ~ "win",
goalsFor <= goalsAgainst ~ "loss"
)) %>%
mutate(tmgm, shotAttemptsDiff = shotAttemptsFor - shotAttemptsAgainst) %>%
mutate(tmgm, shotsOnGoalDiff = shotsOnGoalFor - shotsOnGoalAgainst)
tmgm_hm <- subset(tmgm, tmgm$home_or_away == "HOME")
```
NHL Data
===
Column {data-width=500}
------------------------
### Abstract
#### Visualizing NHL Game Data to Know What Makes a Good Hockey Team
Like every sport, Ice Hockey is no novice to the statistical analysis of the game's rules, players, teams, and rinks in order to collect data that will provide an interesting story about the game: either fun statistics that link certain events, or the prediction of winning odds for a team in games and competitions. Due to so many aspects of Ice Hockey, it is a sport known for high amounts of skill and luck needed to succeed. The puck is of special importance in so much as the luck factor, as "where the puck lands" is a common saying about the unpredictable results of puck placement.
In this study, multiple aspects of the sport's skill and chance will be analyzed in order to answer questions often asked in the Ice Hockey Community, in betting circles, and within the inside organizations that make up the US and Canada's National Hockey League (NHL).
1. Where are most shots shot from in the rink?
Understanding the relative chance of successful shots on goal position and shot attempt positions helps coaches and players understand the high danger areas the puck must stay out of on their side of the rink, and where the forwards must plan strategy around on the opposing team's side. Additionally, it will reflect the state of the game as a whole, as any changes from the past or in the future of the density of shots in any area could be a sign of a change in sport tactics and meta-game.
2. Is Corsi or Fenwick more reliable as a predictive Advanced Stat?
Corsi and Fenwick are two very similar advanced stats used by hockey analysts to compare the indiviual and team skill in the game. The logic behind their importance is simple: more shots means more chances to score a goal, and thus more opportunities to win. The difference is that the Fenwick stat excludes blocked shots from the dataset. Is this exclusion drastic enough to make a difference in reliability between Corsi and Fenwick in predicting goals and wins?
3. What statistics indicate more Offensive Pressure, and thus higher win odds?
Offensive Pressure is a concept of the consistency a team's Forwards can push and pressure the opponent's defense until it breaks, and leads to shots and goals. In a 2014 study forecasting success by NHL hockey stats by Joshua Weissbok, it was concluded that increased Forward activity of a team overall has a higher predictability of success (0.605) across the season than defensive game (0.395). But understanding what basic and advanced stats of the forwards relate to more offensive pressure will determine what actions lead to better offensive pressure, and a higher probability of success throughout the season.
4. Based on current stats, which team is most likely to win the 2023 season?
Using the data collected from the previous explorations, a basic prediction of who could win more games and perhaps win the Stanley Cup can be made as an application of the conclusions of the previous analyses as a test to determine their validity as the current season comes to an end in the Spring, and its results be compared to the predictions.
Column {.tabset .tabset-fade data-width=500}
------------------------
### Shot Data
This dataset has each shot event since 2009, detailing the characteristics of the shot itself and those involved.
1. xCord: The coordinate of the shot across the x-axis of the rink.
2. yCord: The coordinate of the shot across the y-axis of the rink.
3. shotWasOnGoal: a binary where 1 means the shot was a Shot On Goal, or SOG, a shot that if not blocked or grabbed, would have hit the goal net.
```{r events-k}
kable(events[1:10,c(2:9, 15, 24, 25, 122)])
```
### Skater Level Data
This dataset has each skater's (excluding goalies) stats by season, and by situation, based on the NHL's public API and organized by MoneyPuck.com. The data used here was of the 2022 season and combined MoneyPuck.com's public dataset and the NHL's 2022-2023 season skaters' points per game (PPG), goals per game (GPG) and points per hour (P/60) and goals per hour (G/60) by the skater's time on the ice.
1. onIce_fenwickPercentage: The FF% of a player while on the ice.
2. offIce_fenwickPercentage: The FF% of a player while off the ice.
3. onIce_corsiPercentage: The CF% of a player while on the ice.
4. offIce_corsiPercentage: The CF% of a player while off the ice.
5. I_F_shotsOnGoal: The individual credit to a player for a shot on goal (SOG).
6. I_F_shotAttempts:The indiivdual credit to a player for a shot attempt.
7. I_F_takeaways: The amount of takeaways a player has had.
8. I_F_hits: The amount of hits a player has had, including those given penalties for.
9. I_F_faceOffsWon: The amount of faceoffs won on either side of the rink by the player.
10. P/60: the average points made by a player every 60 minutes of on-ice play time.
11. G/60: the average goals scored by a player every 60 minutes of on-ice play time.
```{r sktr-k}
kable(sktr22_gp[1:10, c(1:10, 13:16, 30, 33, 46:48, 155:159)])
```
### Team Level Game Data
This dataset is each team's games played throughout the 2008-2022/23 seasons. This is another dataset from MoneyPuck.com, and only includes data from 5v5 play.
1. home_or_away: Whether the team is the home or away team for their game.
2. corsiPercentage: The team's CF% for the game.
3. fenwickPercentage: The team's FF% for the game.
4. final: Whether the team won or lost their game.
5. shotAttemptsDiff: The difference in shot attempts for and against the team.
```{r teams-k}
kable(tmgm[1:10,c(114, 12:13, 25, 28:29, 77)])
```
Team Statistics
===
Column {data-width=400}
-----------------------------
### Standard NHL Comparisons
In the NHL, the standard analysis of team performance is comparing their goals for (GF) and their goals against (GA). More goals scored in a season than allowed generally means a better team compared to a team with fewer scored goals and more goals allowed.
##### Team Performance
This comparison of teams can also aid watchers decide the "kind of team", as in what a team is like to watch. Low scored, low allowed goal teams are considered "boring" as not much happens in their games. "Bad" teams will have low scored goals and high allowed goals. "Good" teams will have high scored goals and low allowed goals, and "fun" teams will have both stats high. While not having any meaningful use in determining game-to-game performance, nor outcome of a game, it can be a general indicator of the season performance of a team.
##### Player Ice Contribution
When determining the performance of individual players, however, advanced stats like Corsi (CF%) or Fenwick (FF%) are used instead of goals, assists, or saves. CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent's defense.
Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team's stat when the player is performing, and off-ice CF%/FF% is the team's stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.
Column {.tabset .tabset-fade data-width=600}
------------------------------
### Team Rink Locations
##### Map of America and Canada
```{r team-map}
library(usmap)
#US states
USmap <- map_data("state")
state_data <- USmap %>%
filter(region != "district of columbia") %>%
group_by(region) %>%
summarise(long = mean(long), lat = mean(lat)) %>%
arrange(region)
state_data$region.abb <- state.abb[-c(2, 11)] # drop Alaska & Hawaii
state <- ggplot(USmap, aes(x = long, y = lat)) +
geom_polygon(aes(group = group, fill = region), color = "black") +
geom_text(aes(label = region.abb),
data = state_data, fontface = "bold") +
theme_void() +
theme(legend.position = "none", panel.background = element_rect(fill = "#08519c"))
#--------------------------------------------------------------
USCN <- c("Canada")
USCNmap <- map_data("world", USCN)
region.data <- USCNmap %>%
group_by(region) %>%
summarise(long = mean(long), lat = mean(lat))
ctry <- ggplot(USCNmap, aes(x = long, y = lat)) +
geom_polygon(aes(group = group), fill = "#6baed6", color = "white") +
geom_polygon(aes(group = group, x = long, y = lat, text = paste0(region)), data = USmap, color = "white", fill = "#bdd7e7") +
geom_point(data = arenas, aes(x = long, y = lat, text = paste0(arenas$name, "\n", arenas$arena, "\n", "'22 Season: ", arenas$`2022`)), color = "red") +
theme_void() +
theme(legend.position = "none", panel.background = element_rect(fill = "#08519c")) +
coord_cartesian(xlim = c(-132, -61), ylim = c(22, 55))
ggplotly(ctry, tooltip = "text")
```
### Team Performance: Goals For/Against
```{r team-comp}
t2 <- ggplot(teams22, aes(x = goalsFor, y = goalsAgainst)) +
geom_point(aes(text = paste0(name, "\n", "'22 Season Pts: ", teams22$pt_total), color = teams22$pt_total)) +
scale_color_continuous(low = "red", high = "green") +
xlab("Goals For") +
ylab("Goals Against") +
labs(color = "2022-23 Season Points")
ggplotly(t2, tooltip = "text")
```
### Player Ice Contribution: On/Off-Ice FF%
```{r sktr-infl}
t <- ggplot(sktr22_gp, aes(x = onIce_fenwickPercentage, y = offIce_fenwickPercentage, color = team)) +
geom_point(aes(text = paste0("Position: ", position, "\n", name, "\n", team))) +
xlab("On-Ice FF%") +
ylab("Off-Ice FF%")
ggplotly(t, tooltip = "text")
```
Shot Position
===
Column {data-width=550}
------------------------
### Shot Density in the Rink
```{r shotpos}
evt_sog <- subset(events, shotWasOnGoal == 1)
ggplot(evt_sog, aes(x = xCord, y = yCord)) +
geom_point(size = 0.25, alpha = 0.01) +
geom_density_2d_filled(alpha = 0.75, show.legend = FALSE) +
scale_fill_brewer(palette = "Spectral", direction = -1) +
scale_y_continuous(limits = c(-60, 60)) +
theme_void()
```
Column {data-width=450}
------------------------
### Analysis
While its expected for the highest density of shots that are a goal to be in front of the goal itself, meaning the shortest time to react by the defense and easiest accuracy on the shot, the rest make a W shape with wing shots more common than center shots, showing a tendency to rely on the wing for making shots. Additionally the dense zone in front of the goal extends to the edges and even partly behind the goal, displaying the full range of coverage a goalie must be able to defend from.
This image could be due to a number of things, but given the data ranges from 2008-2023, it is most likely due to the positioning of the 3 forwards themselves. basic tactics include keeping the wings of the rink filled with a player for passing or scoring opportunities, and this reveals itself in the amount of SOGs in the wings. It also means that the wings are a vital part to defend compared to the center, which can be covered most by the goalie and passing players between the wings.
Corsi or Fenwick?
===
Column {.tabset .tabset-fade data-width=650}
------------------------------
### Corsi-Fenwick Correlation
```{r corsi-fenwick}
ggplot(sktr22_gp, aes(x = onIce_corsiPercentage, y = onIce_fenwickPercentage)) +
geom_point(alpha = 0.1) +
geom_smooth(color = "green", se = FALSE) +
xlab("On-Ice Corsi %") +
ylab("On-Ice Fenwick %")
```
### Home Team Corsi Predictions
```{r corsi}
ggplot(tmgm) +
geom_boxplot(aes(y = corsiPercentage, x = final, fill = final), outlier.alpha = 0, show.legend = FALSE) +
ylab("Team Corsi % by Game") +
ylim(0.25, 0.75)
```
### Home Team Fenwick Predictions
```{r fenwick}
ggplot(tmgm) +
geom_boxplot(aes(y = fenwickPercentage, x = final, fill = final), outlier.alpha = 0, show.legend = FALSE) +
ylab("Team Fenwick % by Game") +
ylim(0.25, 0.75)
```
Column {.tabset .tabset-fade data-width=350}
-----------------------------
### Using Corsi and Fenwick
CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent's defense.
Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team's stat when the player is performing, and off-ice CF%/FF% is the team's stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.
Both stats are known as Advanced stats, used to compare the relative strength of a team's offense and defense together.
### Analysis
When comparing the CF% and FF% player statistic together, both are understandably highly correlated with each other, given that they are the same calculation, with Fenwick only excluding a type of shot.
When the team's CF%?FF% is compared to the final outcome of a game, however, CF% and FF% show a small, but important difference. CF% is just slightly inversely proportional with the win odds, with the average CF% of a team being higher when they lose than in a win. With the FF%, the average is even no matter if the game is a win or lose, and the 1st and 2nd quartiles are higher for winning games than losing games.
This means, for the purpose of concluding if a team can win a game against another team, FF% is a slightly better stat to compare than CF%, though the distinct value difference between the teams is no good indicator of a win. This can be due to a number of reasons, like a difference in accuracy and precision of a team's forwards, the difference in defensive game of either team, or even simple puck luck.
Factors of Offensive Pressure
===
Column {.tabset .tabset-fade data-width=650}
------------------------------
### Fenwick
```{r fenpress}
ggplot(sktr22_gp, aes(x = onIce_fenwickPercentage, y = `P/60`)) +
geom_point(alpha = 0.25, color = "navy") +
xlab("On-Ice Fenwick %") +
ylab("Points per Play Hour") +
ylim(0, 4)
```
### Corsi
```{r corsipress}
ggplot(sktr22_gp, aes(x = onIce_corsiPercentage, y = `P/60`)) +
geom_point(alpha = 0.25, color = "darkred") +
xlab("On-Ice Corsi %") +
ylab("Points per Play Hour") +
ylim(0, 4) +
xlim(0, 1)
```
### Shots on Goal
```{r sog}
ggplot(sktr22_gp, aes(x = I_F_shotsOnGoal, y = `P/60`)) +
geom_point(alpha = 0.25, color = "navy") +
geom_smooth(color = "darkred", se = FALSE) +
xlab("Individual's Shots On Goal") +
ylab("Points per Play Hour") +
ylim(0, 4)
```
### Shot Attempts
```{r sa}
ggplot(sktr22_gp, aes(x = I_F_shotAttempts, y = `P/60`)) +
geom_point(alpha = 0.25, color = "navy") +
geom_smooth(color = "darkred", se = FALSE) +
xlab("Individual's Shot Attempts") +
ylab("Points per Play Hour") +
ylim(0, 4)
```
### SA Difference
```{r sadiff}
tmgm_hm_w <- subset(tmgm_hm, tmgm_hm$final == "win")
tmgm_hm_l <- subset(tmgm_hm, tmgm_hm$final == "loss")
ggplot(tmgm_hm, aes(x = shotAttemptsDiff, fill = final)) +
geom_histogram(binwidth = 1) +
geom_vline(data = tmgm_hm, aes(xintercept = mean(tmgm_hm_w$shotAttemptsDiff)), color = "blue") +
geom_vline(data = tmgm_hm, aes(xintercept = mean(tmgm_hm_l$shotAttemptsDiff)), color = "red") +
xlab("Difference in Shot Attempts (by Home Team)") +
labs(fill = "Game Outcome")
```
### Takeaways
```{r takeaway}
ggplot(sktr22_gp, aes(x = I_F_takeaways, y = `P/60`)) +
geom_point(alpha = 0.25, color = "navy") +
geom_smooth(color = "darkred", se = FALSE) +
xlab("Individual's Takeaways") +
ylab("Points per Play Hour") +
ylim(0, 4)
```
### Hits
```{r hits}
ggplot(sktr22_gp, aes(x = I_F_hits, y = `P/60`)) +
geom_point(alpha = 0.25, color = "navy") +
geom_smooth(color = "darkred", se = FALSE) +
xlab("Individual's Hits Made") +
ylab("Points per Play Hour") +
ylim(0, 4)
```
### Face-offs
```{r faceoffs}
ggplot(sktr22_gp, aes(x = I_F_faceOffsWon, y = `P/60`)) +
geom_point(alpha = 0.25, color = "navy") +
xlab("Individual's Face Offs Won") +
ylab("Points per Play Hour") +
ylim(0, 4)
```
Column {.tabset .tabset-fade data-width=350}
------------------------------
### Why Offensive Pressure?
Offensive Pressure is no single statistic, but a value determined by the overall performance of a shift in a game. Unlike goals scored or shots attempted, the offensive pressure of a shift can be good, despite a lack of goals or shots to show for it. But, offensive pressure can prevent the possession of the puck by the opponent, additional stress that may lead to more mistakes by the opponent, and more.
So what statistic is used for offensive pressure? In hockey, the points per play hour, or P/60 statistic is considered to best represent the pressure a player puts in the rink when on the ice. This is because more activity, more possession, and more opportunities means more possible points made.
### Analysis
Most of these statistics were expected to increase P/60 due to the nature of more opportunities of a player to do both the stat and gain more P/60. However, the difference in shot attempts between the home and away teams were actually inversely proportional to the game's outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.
These results make sense, both because opportunity time for one stat means opportunity time for another, and that more hits a player makes means more possible penalties from those hits, and thus less play time, and no pressure being added on the ice. The shot attempts differential result, however, is against basic thought that shooting more means more chances for goals, and thus more points. the slight inverse proportionality may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.
##### On Fenwick
Fenwick certainly correlates with P/60, as a higher percentage means more shot attempts in general, but the amount of variation is severe, and the correlation does not have a major effect on P/60. This tracks given Fenwick's record with winning odds and predictive abilities. Shot Attempts, no matter the type, does mean giving the puck away on a shot, and the variation of goals and misses means the points a player can accrue is limited within the statistic.
##### On Corsi
Corsi is in essentially the same boat as Fenwick in correlating with P/60, with no major differences. In fact, it has the same issue as Fenwick does in correlating with P/60 and, as such, offensive pressure.
##### On Shots On Goal
SOG has a good correlation with P/60, as shots on goal are shots more likely to go in the net given that only the defense is in the way of it being a goal, and thus SOG tend to be goals more often.
##### On Shot Attempts
Shot Attempts, like SOG, have a similar correlation with P/60. This, like the Corsi/Fenwick relation, is for the same reasons as SOG.
##### On Takeaways
Takeaways have a positive but slowly degrading correlation with P/60, with a significant amount of takeaways leading to diminishing returns in P/60. In terms of offensive pressure, no stat could be reasoned to be more offensive than taking possession of the puck away from the opposing team. More takeaways means more pressure, and the graph reflects that.
##### On Hits
Hits have a surprising result, given the importance hitting has in the sport culturally. Fewer hits correlate with more P/60, with more hits leading to a slightly lower P/60. Game-wise, more hits lead to more penalties, and that cuts opportunities for points to be made, and removes the player from adding offensive pressure from the ice.
##### On Face Offs WOn
Face offs won have a slight correlation with P/60, but cannot compare to other stats given the huge range of data on the left of the graph where the defense positions play. So despite the slight correlation with P/60 and offensive pressure, as won face offs means more opportunities for puck possession, the relation is too small to have a real effect in the game, and the season as a whole.
Conclusion & Background
===
Column {data-width=550}
-------------------------
### Applications in Hockey
##### Shot Positions
Understanding the common locations of puck shots allows teams to strategize their defense around defending against those shots, especially for the goalie watching for where and when the puck may be shot.
For the offense of a team, the location density can be compared with their own data, leading to different strategies of where to force a puck down the line, and how to set up a shot against their opponents.
##### Using Fenwick
Fenwick is a better stat for predicting success by both players and teams in competing during a season over Corsi. The average rate of wins correlate more with Fenwick than with Corsi.
##### The Important Stats in a Game
The difference in shot attempts between the home and away teams were actually inversely proportional to the game's outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.
The shot attempts differential however, has a slight inverse proportionality that may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.
Other stats, like Takeaways, SOG, Fenwick/Corsi, and Shot Attempts had a positive correlation with a player's P/60, and therefore offensive pressure.
##### Limitations and Future Work
The data collected was across the 2022-2023 season for individual and team stats. The available graphical analysis may not have exposed additional correlations between stats, or other advanced stats may have an unexpected effect on the outcome of a game, or season.
Assumptions like the situation always being 5v5 were important in understanding the effect of certain stats in equal play, but uneven situations may lead to some stats having more or less of an impact on a game.
For the Future, larger studies on all situations and available seasonal data could lead to a more comprehensive understanding of the questions asked.
Column {data-width=450}
-------------------------
### The Author
[Stephen Boerger's LinkedIn](https://www.linkedin.com/in/stephen-boerger-04104b294/)
Stephen Boerger created this Data Analysis Project for the Analytics Class Final, and as such presents his skills as of the Fall of 2023. Hockey was chosen because of his recent entry into being a fan of the sport, and discovering what stats affect what parts of the game and the season has helped teach him more about the sport he is growing to love.
### Resource Citations
[MoneyPuck.com Hockey Data](https://moneypuck.com/data.htm)
[Wikipedia NHL Arenas](https://en.wikipedia.org/wiki/List_of_National_Hockey_League_arenas)
[NHL Open API Data](https://www.nhl.com/stats/skaters?report=scoringRates&reportType=season&seasonFrom=20222023&seasonTo=20222023&gameType=2&filter=gamesPlayed,gte,1&sort=pointsPer605v5,goalsPer605v5&page=9&pageSize=100)
[Map data for Canada](https://github.com/joellecayen/canadianmaps/blob/main/R/canadianmaps.R)
[2014 Study on Success in the NHL by Weissbock](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.841.8005&rep=rep1&type=pdf)