NHL Data

Column

Abstract

Visualizing NHL Game Data to Know What Makes a Good Hockey Team

Like every sport, Ice Hockey is no novice to the statistical analysis of the game’s rules, players, teams, and rinks in order to collect data that will provide an interesting story about the game: either fun statistics that link certain events, or the prediction of winning odds for a team in games and competitions. Due to so many aspects of Ice Hockey, it is a sport known for high amounts of skill and luck needed to succeed. The puck is of special importance in so much as the luck factor, as “where the puck lands” is a common saying about the unpredictable results of puck placement.

In this study, multiple aspects of the sport’s skill and chance will be analyzed in order to answer questions often asked in the Ice Hockey Community, in betting circles, and within the inside organizations that make up the US and Canada’s National Hockey League (NHL).

  1. Where are most shots shot from in the rink?

Understanding the relative chance of successful shots on goal position and shot attempt positions helps coaches and players understand the high danger areas the puck must stay out of on their side of the rink, and where the forwards must plan strategy around on the opposing team’s side. Additionally, it will reflect the state of the game as a whole, as any changes from the past or in the future of the density of shots in any area could be a sign of a change in sport tactics and meta-game.

  1. Is Corsi or Fenwick more reliable as a predictive Advanced Stat?

Corsi and Fenwick are two very similar advanced stats used by hockey analysts to compare the indiviual and team skill in the game. The logic behind their importance is simple: more shots means more chances to score a goal, and thus more opportunities to win. The difference is that the Fenwick stat excludes blocked shots from the dataset. Is this exclusion drastic enough to make a difference in reliability between Corsi and Fenwick in predicting goals and wins?

  1. What statistics indicate more Offensive Pressure, and thus higher win odds?

Offensive Pressure is a concept of the consistency a team’s Forwards can push and pressure the opponent’s defense until it breaks, and leads to shots and goals. In a 2014 study forecasting success by NHL hockey stats by Joshua Weissbok, it was concluded that increased Forward activity of a team overall has a higher predictability of success (0.605) across the season than defensive game (0.395). But understanding what basic and advanced stats of the forwards relate to more offensive pressure will determine what actions lead to better offensive pressure, and a higher probability of success throughout the season.

  1. Based on current stats, which team is most likely to win the 2023 season?

Using the data collected from the previous explorations, a basic prediction of who could win more games and perhaps win the Stanley Cup can be made as an application of the conclusions of the previous analyses as a test to determine their validity as the current season comes to an end in the Spring, and its results be compared to the predictions.

Column

Shot Data

This dataset has each shot event since 2009, detailing the characteristics of the shot itself and those involved.

  1. xCord: The coordinate of the shot across the x-axis of the rink.

  2. yCord: The coordinate of the shot across the y-axis of the rink.

  3. shotWasOnGoal: a binary where 1 means the shot was a Shot On Goal, or SOG, a shot that if not blocked or grabbed, would have hit the goal net.

homeTeamCode awayTeamCode season isPlayoffGame game_id homeTeamWon id time event xCord yCord shotWasOnGoal
NSH SJS 2022 0 20001 1 8 23 SHOT 44 8 1
NSH SJS 2022 0 20001 1 11 36 MISS 44 27 0
NSH SJS 2022 0 20001 1 15 59 SHOT -33 8 1
NSH SJS 2022 0 20001 1 16 61 GOAL -74 -5 1
NSH SJS 2022 0 20001 1 18 72 SHOT -81 15 1
NSH SJS 2022 0 20001 1 20 97 MISS -68 9 0
NSH SJS 2022 0 20001 1 26 162 SHOT 72 2 1
NSH SJS 2022 0 20001 1 30 209 MISS 51 12 0
NSH SJS 2022 0 20001 1 34 249 SHOT -40 22 1
NSH SJS 2022 0 20001 1 37 259 SHOT 49 12 1

Skater Level Data

This dataset has each skater’s (excluding goalies) stats by season, and by situation, based on the NHL’s public API and organized by MoneyPuck.com. The data used here was of the 2022 season and combined MoneyPuck.com’s public dataset and the NHL’s 2022-2023 season skaters’ points per game (PPG), goals per game (GPG) and points per hour (P/60) and goals per hour (G/60) by the skater’s time on the ice.

  1. onIce_fenwickPercentage: The FF% of a player while on the ice.

  2. offIce_fenwickPercentage: The FF% of a player while off the ice.

  3. onIce_corsiPercentage: The CF% of a player while on the ice.

  4. offIce_corsiPercentage: The CF% of a player while off the ice.

  5. I_F_shotsOnGoal: The individual credit to a player for a shot on goal (SOG).

  6. I_F_shotAttempts:The indiivdual credit to a player for a shot attempt.

  7. I_F_takeaways: The amount of takeaways a player has had.

  8. I_F_hits: The amount of hits a player has had, including those given penalties for.

  9. I_F_faceOffsWon: The amount of faceoffs won on either side of the rink by the player.

  10. P/60: the average points made by a player every 60 minutes of on-ice play time.

  11. G/60: the average goals scored by a player every 60 minutes of on-ice play time.

name playerId season team position situation games_played icetime shifts gameScore onIce_corsiPercentage offIce_corsiPercentage onIce_fenwickPercentage offIce_fenwickPercentage I_F_shotsOnGoal I_F_shotAttempts I_F_faceOffsWon I_F_hits I_F_takeaways G/60 A/60 A1/60 A2/60 P/60
A.J. Greer 8478421 2022 BOS L 5on5 61 32588 806 12.16 0.47 0.51 0.49 0.51 65 106 2 101 16 0.44 0.77 0.44 0.33 1.22
Aaron Ekblad 8477932 2022 FLA D 5on5 71 69744 1496 53.95 0.55 0.53 0.53 0.52 132 235 0 53 24 0.21 0.57 0.42 0.16 0.78
Aatu Raty 8482691 2022 VAN C 5on5 15 7066 177 3.37 0.50 0.51 0.52 0.53 15 25 42 19 4 1.02 0.51 0.51 0.00 1.53
Adam Beckman 8481550 2022 MIN L 5on5 9 5299 115 1.69 0.54 0.48 0.52 0.49 12 25 0 3 2 0.00 0.00 0.00 0.00 0.00
Adam Boqvist 8480871 2022 CBJ D 5on5 46 41194 899 19.90 0.48 0.44 0.48 0.43 44 87 0 21 10 0.44 0.71 0.35 0.35 1.15
Adam Erne 8477454 2022 DET L 5on5 61 42003 908 6.46 0.42 0.48 0.42 0.48 48 107 10 155 11 0.52 0.69 0.34 0.34 1.20
Adam Fox 8479323 2022 NYR D 5on5 82 88198 1722 78.53 0.54 0.47 0.54 0.48 101 207 0 19 65 0.34 1.09 0.50 0.59 1.42
Adam Ginning 8480874 2022 PHI D 5on5 1 934 18 0.40 0.63 0.52 0.63 0.54 0 1 0 1 0 0.00 0.00 0.00 0.00 0.00
Adam Henrique 8474641 2022 ANA C 5on5 62 47265 1069 32.10 0.46 0.42 0.45 0.41 94 150 131 26 14 1.16 0.78 0.31 0.47 1.94
Adam Larsson 8476457 2022 SEA D 5on5 82 97656 1890 55.48 0.55 0.51 0.56 0.52 133 282 0 196 29 0.22 0.82 0.45 0.37 1.05

Team Level Game Data

This dataset is each team’s games played throughout the 2008-2022/23 seasons. This is another dataset from MoneyPuck.com, and only includes data from 5v5 play.

  1. home_or_away: Whether the team is the home or away team for their game.

  2. corsiPercentage: The team’s CF% for the game.

  3. fenwickPercentage: The team’s FF% for the game.

  4. final: Whether the team won or lost their game.

  5. shotAttemptsDiff: The difference in shot attempts for and against the team.

shotsOnGoalDiff corsiPercentage fenwickPercentage shotsOnGoalFor shotAttemptsFor goalsFor goalsAgainst
17 0.6429 0.6364 28 45 1 1
13 0.6145 0.6167 26 51 1 1
-4 0.4706 0.4412 22 40 4 2
-2 0.4615 0.4286 18 30 3 2
-2 0.4583 0.4510 17 33 1 1
3 0.6226 0.6471 12 33 1 0
6 0.6308 0.6327 18 41 0 0
-11 0.3883 0.3793 27 40 4 3
6 0.5952 0.5625 24 50 0 1
2 0.5441 0.5294 19 37 2 1

Team Statistics

Column

Standard NHL Comparisons

In the NHL, the standard analysis of team performance is comparing their goals for (GF) and their goals against (GA). More goals scored in a season than allowed generally means a better team compared to a team with fewer scored goals and more goals allowed.

Team Performance

This comparison of teams can also aid watchers decide the “kind of team”, as in what a team is like to watch. Low scored, low allowed goal teams are considered “boring” as not much happens in their games. “Bad” teams will have low scored goals and high allowed goals. “Good” teams will have high scored goals and low allowed goals, and “fun” teams will have both stats high. While not having any meaningful use in determining game-to-game performance, nor outcome of a game, it can be a general indicator of the season performance of a team.

Player Ice Contribution

When determining the performance of individual players, however, advanced stats like Corsi (CF%) or Fenwick (FF%) are used instead of goals, assists, or saves. CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent’s defense.

Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team’s stat when the player is performing, and off-ice CF%/FF% is the team’s stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.

Column

Team Rink Locations

Map of America and Canada

Team Performance: Goals For/Against

Player Ice Contribution: On/Off-Ice FF%

Shot Position

Column

Shot Density in the Rink

Column

Analysis

While its expected for the highest density of shots that are a goal to be in front of the goal itself, meaning the shortest time to react by the defense and easiest accuracy on the shot, the rest make a W shape with wing shots more common than center shots, showing a tendency to rely on the wing for making shots. Additionally the dense zone in front of the goal extends to the edges and even partly behind the goal, displaying the full range of coverage a goalie must be able to defend from.

This image could be due to a number of things, but given the data ranges from 2008-2023, it is most likely due to the positioning of the 3 forwards themselves. basic tactics include keeping the wings of the rink filled with a player for passing or scoring opportunities, and this reveals itself in the amount of SOGs in the wings. It also means that the wings are a vital part to defend compared to the center, which can be covered most by the goalie and passing players between the wings.

Corsi or Fenwick?

Column

Corsi-Fenwick Correlation

Home Team Corsi Predictions

Home Team Fenwick Predictions

Column

Using Corsi and Fenwick

CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent’s defense.

Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team’s stat when the player is performing, and off-ice CF%/FF% is the team’s stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.

Both stats are known as Advanced stats, used to compare the relative strength of a team’s offense and defense together.

Analysis

When comparing the CF% and FF% player statistic together, both are understandably highly correlated with each other, given that they are the same calculation, with Fenwick only excluding a type of shot.

When the team’s CF%?FF% is compared to the final outcome of a game, however, CF% and FF% show a small, but important difference. CF% is just slightly inversely proportional with the win odds, with the average CF% of a team being higher when they lose than in a win. With the FF%, the average is even no matter if the game is a win or lose, and the 1st and 2nd quartiles are higher for winning games than losing games.

This means, for the purpose of concluding if a team can win a game against another team, FF% is a slightly better stat to compare than CF%, though the distinct value difference between the teams is no good indicator of a win. This can be due to a number of reasons, like a difference in accuracy and precision of a team’s forwards, the difference in defensive game of either team, or even simple puck luck.

Factors of Offensive Pressure

Column

Fenwick

Corsi

Shots on Goal

Shot Attempts

SA Difference

Takeaways

Hits

Face-offs

Column

Why Offensive Pressure?

Offensive Pressure is no single statistic, but a value determined by the overall performance of a shift in a game. Unlike goals scored or shots attempted, the offensive pressure of a shift can be good, despite a lack of goals or shots to show for it. But, offensive pressure can prevent the possession of the puck by the opponent, additional stress that may lead to more mistakes by the opponent, and more.

So what statistic is used for offensive pressure? In hockey, the points per play hour, or P/60 statistic is considered to best represent the pressure a player puts in the rink when on the ice. This is because more activity, more possession, and more opportunities means more possible points made.

Analysis

Most of these statistics were expected to increase P/60 due to the nature of more opportunities of a player to do both the stat and gain more P/60. However, the difference in shot attempts between the home and away teams were actually inversely proportional to the game’s outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.

These results make sense, both because opportunity time for one stat means opportunity time for another, and that more hits a player makes means more possible penalties from those hits, and thus less play time, and no pressure being added on the ice. The shot attempts differential result, however, is against basic thought that shooting more means more chances for goals, and thus more points. the slight inverse proportionality may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.

On Fenwick

Fenwick certainly correlates with P/60, as a higher percentage means more shot attempts in general, but the amount of variation is severe, and the correlation does not have a major effect on P/60. This tracks given Fenwick’s record with winning odds and predictive abilities. Shot Attempts, no matter the type, does mean giving the puck away on a shot, and the variation of goals and misses means the points a player can accrue is limited within the statistic.

On Corsi

Corsi is in essentially the same boat as Fenwick in correlating with P/60, with no major differences. In fact, it has the same issue as Fenwick does in correlating with P/60 and, as such, offensive pressure.

On Shots On Goal

SOG has a good correlation with P/60, as shots on goal are shots more likely to go in the net given that only the defense is in the way of it being a goal, and thus SOG tend to be goals more often.

On Shot Attempts

Shot Attempts, like SOG, have a similar correlation with P/60. This, like the Corsi/Fenwick relation, is for the same reasons as SOG.

On Takeaways

Takeaways have a positive but slowly degrading correlation with P/60, with a significant amount of takeaways leading to diminishing returns in P/60. In terms of offensive pressure, no stat could be reasoned to be more offensive than taking possession of the puck away from the opposing team. More takeaways means more pressure, and the graph reflects that.

On Hits

Hits have a surprising result, given the importance hitting has in the sport culturally. Fewer hits correlate with more P/60, with more hits leading to a slightly lower P/60. Game-wise, more hits lead to more penalties, and that cuts opportunities for points to be made, and removes the player from adding offensive pressure from the ice.

On Face Offs WOn

Face offs won have a slight correlation with P/60, but cannot compare to other stats given the huge range of data on the left of the graph where the defense positions play. So despite the slight correlation with P/60 and offensive pressure, as won face offs means more opportunities for puck possession, the relation is too small to have a real effect in the game, and the season as a whole.

Conclusion & Background

Column

Applications in Hockey

Shot Positions

Understanding the common locations of puck shots allows teams to strategize their defense around defending against those shots, especially for the goalie watching for where and when the puck may be shot.

For the offense of a team, the location density can be compared with their own data, leading to different strategies of where to force a puck down the line, and how to set up a shot against their opponents.

Using Fenwick

Fenwick is a better stat for predicting success by both players and teams in competing during a season over Corsi. The average rate of wins correlate more with Fenwick than with Corsi.

The Important Stats in a Game

The difference in shot attempts between the home and away teams were actually inversely proportional to the game’s outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.

The shot attempts differential however, has a slight inverse proportionality that may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.

Other stats, like Takeaways, SOG, Fenwick/Corsi, and Shot Attempts had a positive correlation with a player’s P/60, and therefore offensive pressure.

Limitations and Future Work

The data collected was across the 2022-2023 season for individual and team stats. The available graphical analysis may not have exposed additional correlations between stats, or other advanced stats may have an unexpected effect on the outcome of a game, or season.

Assumptions like the situation always being 5v5 were important in understanding the effect of certain stats in equal play, but uneven situations may lead to some stats having more or less of an impact on a game.

For the Future, larger studies on all situations and available seasonal data could lead to a more comprehensive understanding of the questions asked.

Column

The Author

Stephen Boerger’s LinkedIn

Stephen Boerger created this Data Analysis Project for the Analytics Class Final, and as such presents his skills as of the Fall of 2023. Hockey was chosen because of his recent entry into being a fan of the sport, and discovering what stats affect what parts of the game and the season has helped teach him more about the sport he is growing to love.

---
title: "What Makes an NHL Team?"
output:
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: default
      navbar-bg: "#4292c6"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

<style>
.chart-title { /* chart_title */
  font-size: 22px;
  }
body{ /* Normal */ 
    font-size: 18px;
    }
</style>

<head>
    <base target="_blank">
</head>



```{r setup, include=FALSE}

library(tidyverse)
library(pacman)
library(plotly)
library(dplyr)
library(knitr)

events <- read_csv("shots_2007-2022.csv")

sktr22 <- read_csv("skaters22-23.csv")

teams <- read_csv("all_teams2008-2023.csv")
teams22 <- read_csv("teams22.csv")

#https://rstudio-pubs-static.s3.amazonaws.com/257443_6639015f2f144de7af35ce4615902dfd.html
#and wikipedia
arenas <- read_csv("arenas.csv")

nhl22_1 <- read_csv("sktr22pgp1.csv")
nhl22_2 <- read_csv("sktr22pgp2.csv")
nhl22_3 <- read_csv("sktr22pgp3.csv")
nhl22_4 <- read_csv("sktr22pgp4.csv")
nhl22_5 <- read_csv("sktr22pgp5.csv")
nhl22_6 <- read_csv("sktr22pgp6.csv")
nhl22_7 <- read_csv("sktr22pgp7.csv")
nhl22_8 <- read_csv("sktr22pgp8.csv")
nhl22_9 <- read_csv("sktr22pgp9.csv")
nhl22_10 <- read_csv("sktr22pgp10.csv")

nhl22 <- rbind(nhl22_1, nhl22_2, nhl22_3, nhl22_4, nhl22_5, nhl22_6, nhl22_7, nhl22_8, nhl22_9, nhl22_10)

```

```{r combos}
sktr22 <- subset(sktr22, sktr22$situation == "5on5")

nhl22 <- rename(nhl22, name = Player)
nhl22gp <- select(nhl22, c("name", `G/60`:`P/60`))

sktr22_gp <- merge(sktr22, nhl22gp, by = "name")

teams22 <- subset(teams22, teams22$situation == "5on5")
teams22 <- mutate(teams22, shotAttemptsDiff = shotAttemptsFor - shotAttemptsAgainst)  %>% mutate(teams22, shotsOnGoalDiff = shotsOnGoalFor - shotsOnGoalAgainst)

tmgm <- subset(teams, teams$situation == "5on5")
tmgm <- mutate(tmgm, final = case_when(
     goalsFor > goalsAgainst ~ "win",
      goalsFor <= goalsAgainst ~ "loss"
     )) %>%
    mutate(tmgm, shotAttemptsDiff = shotAttemptsFor - shotAttemptsAgainst) %>%
    mutate(tmgm, shotsOnGoalDiff = shotsOnGoalFor - shotsOnGoalAgainst)

tmgm_hm <- subset(tmgm, tmgm$home_or_away == "HOME")
```


NHL Data
===

Column {data-width=500}
------------------------
### Abstract

#### Visualizing NHL Game Data to Know What Makes a Good Hockey Team
Like every sport, Ice Hockey is no novice to the statistical analysis of the game's rules, players, teams, and rinks in order to collect data that will provide an interesting story about the game: either fun statistics that link certain events, or the prediction of winning odds for a team in games and competitions. Due to so many aspects of Ice Hockey, it is a sport known for high amounts of skill and luck needed to succeed. The puck is of special importance in so much as the luck factor, as "where the puck lands" is a common saying about the unpredictable results of puck placement.

In this study, multiple aspects of the sport's skill and chance will be analyzed in order to answer questions often asked in the Ice Hockey Community, in betting circles, and within the inside organizations that make up the US and Canada's National Hockey League (NHL).

1. Where are most shots shot from in the rink?

Understanding the relative chance of successful shots on goal position and shot attempt positions helps coaches and players understand the high danger areas the puck must stay out of on their side of the rink, and where the forwards must plan strategy around on the opposing team's side. Additionally, it will reflect the state of the game as a whole, as any changes from the past or in the future of the density of shots in any area could be a sign of a change in sport tactics and meta-game.

2. Is Corsi or Fenwick more reliable as a predictive Advanced Stat?

Corsi and Fenwick are two very similar advanced stats used by hockey analysts to compare the indiviual and team skill in the game. The logic behind their importance is simple: more shots means more chances to score a goal, and thus more opportunities to win. The difference is that the Fenwick stat excludes blocked shots from the dataset. Is this exclusion drastic enough to make a difference in reliability between Corsi and Fenwick in predicting goals and wins?

3. What statistics indicate more Offensive Pressure, and thus higher win odds?

Offensive Pressure is a concept of the consistency a team's Forwards can push and pressure the opponent's defense until it breaks, and leads to shots and goals. In a 2014 study forecasting success by NHL hockey stats by Joshua Weissbok, it was concluded that increased Forward activity of a team overall has a higher predictability of success (0.605) across the season than defensive game (0.395). But understanding what basic and advanced stats of the forwards relate to more offensive pressure will determine what actions lead to better offensive pressure, and a higher probability of success throughout the season.

4. Based on current stats, which team is most likely to win the 2023 season?

Using the data collected from the previous explorations, a basic prediction of who could win more games and perhaps win the Stanley Cup can be made as an application of the conclusions of the previous analyses as a test to determine their validity as the current season comes to an end in the Spring, and its results be compared to the predictions.



Column {.tabset .tabset-fade data-width=500}
------------------------

### Shot Data

This dataset has each shot event since 2009, detailing the characteristics of the shot itself and those involved.

1. xCord: The coordinate of the shot across the x-axis of the rink.

2. yCord: The coordinate of the shot across the y-axis of the rink.

3. shotWasOnGoal: a binary where 1 means the shot was a Shot On Goal, or SOG, a shot that if not blocked or grabbed, would have hit the goal net.

```{r events-k}
kable(events[1:10,c(2:9, 15, 24, 25, 122)])
```

### Skater Level Data

This dataset has each skater's (excluding goalies) stats by season, and by situation, based on the NHL's public API and organized by MoneyPuck.com. The data used here was of the 2022 season and combined MoneyPuck.com's public dataset and the NHL's 2022-2023 season skaters' points per game (PPG), goals per game (GPG) and points per hour (P/60) and goals per hour (G/60) by the skater's time on the ice.

1. onIce_fenwickPercentage: The FF% of a player while on the ice.

2. offIce_fenwickPercentage: The FF% of a player while off the ice.

3. onIce_corsiPercentage: The CF% of a player while on the ice.

4. offIce_corsiPercentage: The CF% of a player while off the ice.

5. I_F_shotsOnGoal: The individual credit to a player for a shot on goal (SOG).

6. I_F_shotAttempts:The indiivdual credit to a player for a shot attempt.

7. I_F_takeaways: The amount of takeaways a player has had.

8. I_F_hits: The amount of hits a player has had, including those given penalties for.

9. I_F_faceOffsWon: The amount of faceoffs won on either side of the rink by the player.

10. P/60: the average points made by a player every 60 minutes of on-ice play time.

11. G/60: the average goals scored by a player every 60 minutes of on-ice play time.

```{r sktr-k}
kable(sktr22_gp[1:10, c(1:10, 13:16, 30, 33, 46:48, 155:159)])
```

### Team Level Game Data

This dataset is each team's games played throughout the 2008-2022/23 seasons. This is another dataset from MoneyPuck.com, and only includes data from 5v5 play.

1. home_or_away: Whether the team is the home or away team for their game.

2. corsiPercentage: The team's CF% for the game.

3. fenwickPercentage: The team's FF% for the game.

4. final: Whether the team won or lost their game.

5. shotAttemptsDiff: The difference in shot attempts for and against the team.

```{r teams-k}
kable(tmgm[1:10,c(114, 12:13, 25, 28:29, 77)])
```

Team Statistics
===

Column {data-width=400}
-----------------------------

### Standard NHL Comparisons

In the NHL, the standard analysis of team performance is comparing their goals for (GF) and their goals against (GA). More goals scored in a season than allowed generally means a better team compared to a team with fewer scored goals and more goals allowed.

##### Team Performance

This comparison of teams can also aid watchers decide the "kind of team", as in what a team is like to watch. Low scored, low allowed goal teams are considered "boring" as not much happens in their games. "Bad" teams will have low scored goals and high allowed goals. "Good" teams will have high scored goals and low allowed goals, and "fun" teams will have both stats high. While not having any meaningful use in determining game-to-game performance, nor outcome of a game, it can be a general indicator of the season performance of a team.

##### Player Ice Contribution

When determining the performance of individual players, however, advanced stats like Corsi (CF%) or Fenwick (FF%) are used instead of goals, assists, or saves. CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent's defense.

Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team's stat when the player is performing, and off-ice CF%/FF% is the team's stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.



Column {.tabset .tabset-fade data-width=600}
------------------------------

### Team Rink Locations

##### Map of America and Canada

```{r team-map}
library(usmap)

#US states
USmap <- map_data("state")

state_data <- USmap %>%
  filter(region != "district of columbia") %>% 
  group_by(region) %>%
  summarise(long = mean(long), lat = mean(lat)) %>% 
  arrange(region)

state_data$region.abb <- state.abb[-c(2, 11)] # drop Alaska & Hawaii

state <- ggplot(USmap, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group, fill = region), color = "black") +
    geom_text(aes(label = region.abb), 
              data = state_data, fontface = "bold") +
  theme_void() + 
  theme(legend.position = "none", panel.background = element_rect(fill = "#08519c"))
#--------------------------------------------------------------
USCN <- c("Canada")
USCNmap <- map_data("world", USCN)

region.data <- USCNmap %>%
  group_by(region) %>%
  summarise(long = mean(long), lat = mean(lat))

ctry <- ggplot(USCNmap, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group), fill = "#6baed6", color = "white") +
  geom_polygon(aes(group = group, x = long, y = lat, text = paste0(region)), data = USmap, color = "white", fill = "#bdd7e7") +
  geom_point(data = arenas, aes(x = long, y = lat, text = paste0(arenas$name, "\n", arenas$arena, "\n", "'22 Season: ", arenas$`2022`)), color = "red") +
  theme_void() +
  theme(legend.position = "none", panel.background = element_rect(fill = "#08519c")) +
  coord_cartesian(xlim = c(-132, -61), ylim = c(22, 55))

ggplotly(ctry, tooltip = "text")

```

### Team Performance: Goals For/Against

```{r team-comp}
t2 <- ggplot(teams22, aes(x = goalsFor, y = goalsAgainst)) +
  geom_point(aes(text = paste0(name, "\n", "'22 Season Pts: ", teams22$pt_total), color = teams22$pt_total)) +
  scale_color_continuous(low = "red", high = "green") +
  xlab("Goals For") +
  ylab("Goals Against") +
  labs(color = "2022-23 Season Points")

ggplotly(t2, tooltip = "text")
```

### Player Ice Contribution: On/Off-Ice FF%

```{r sktr-infl}
t <- ggplot(sktr22_gp, aes(x = onIce_fenwickPercentage, y = offIce_fenwickPercentage, color = team)) +
  geom_point(aes(text = paste0("Position: ", position, "\n", name, "\n", team))) +
  xlab("On-Ice FF%") +
  ylab("Off-Ice FF%")

ggplotly(t, tooltip = "text")
```


Shot Position
===

Column {data-width=550}
------------------------

### Shot Density in the Rink

```{r shotpos}
evt_sog <- subset(events, shotWasOnGoal == 1)

ggplot(evt_sog, aes(x = xCord, y = yCord)) +
  geom_point(size = 0.25, alpha = 0.01) +
  geom_density_2d_filled(alpha = 0.75, show.legend = FALSE) +
  scale_fill_brewer(palette = "Spectral", direction = -1) +
  scale_y_continuous(limits = c(-60, 60)) +
  theme_void()
```

Column {data-width=450}
------------------------

### Analysis

While its expected for the highest density of shots that are a goal to be in front of the goal itself, meaning the shortest time to react by the defense and easiest accuracy on the shot, the rest make a W shape with wing shots more common than center shots, showing a tendency to rely on the wing for making shots. Additionally the dense zone in front of the goal extends to the edges and even partly behind the goal, displaying the full range of coverage a goalie must be able to defend from.

This image could be due to a number of things, but given the data ranges from 2008-2023, it is most likely due to the positioning of the 3 forwards themselves. basic tactics include keeping the wings of the rink filled with a player for passing or scoring opportunities, and this reveals itself in the amount of SOGs in the wings. It also means that the wings are a vital part to defend compared to the center, which can be covered most by the goalie and passing players between the wings.



Corsi or Fenwick?
===

Column {.tabset .tabset-fade data-width=650}
------------------------------

### Corsi-Fenwick Correlation

```{r corsi-fenwick}
ggplot(sktr22_gp, aes(x = onIce_corsiPercentage, y = onIce_fenwickPercentage)) +
  geom_point(alpha = 0.1) +
  geom_smooth(color = "green", se = FALSE) +
  xlab("On-Ice Corsi %") +
  ylab("On-Ice Fenwick %")
```

### Home Team Corsi Predictions

```{r corsi}
ggplot(tmgm) +
  geom_boxplot(aes(y = corsiPercentage, x = final, fill = final), outlier.alpha = 0, show.legend = FALSE) +
  ylab("Team Corsi % by Game") +
  ylim(0.25, 0.75)
```

### Home Team Fenwick Predictions

```{r fenwick}
ggplot(tmgm) +
  geom_boxplot(aes(y = fenwickPercentage, x = final, fill = final), outlier.alpha = 0, show.legend = FALSE) +
  ylab("Team Fenwick % by Game") +
  ylim(0.25, 0.75)
```

Column {.tabset .tabset-fade data-width=350}
-----------------------------

### Using Corsi and Fenwick

CF% is shot attempts made over the shot attempts made total in a game ay equal strength (5v5), and the season CF% of a player is the average of their games. FF% is the same as Corsi, but excludes blocked shots in the tally, with the logic being that blocked shots are an indicator of defensive advantage rather than the ability of the player to push through the opponent's defense.

Additionally, these stats can be calculated on a team level when the player is on the ice, or off the ice. On-ice CF%/FF% is the team's stat when the player is performing, and off-ice CF%/FF% is the team's stat when not performing. Comparing the on-ice and off-ice stats of an individual player can give a glimpse into the effect they have on the field. The average for a player is 0.50, with a higher on-ice percentage than off-ice percentage meaning the player makes a difference on the ice, and the opposite meaning the player is a detriment, or some other event occurring.

Both stats are known as Advanced stats, used to compare the relative strength of a team's offense and defense together.

### Analysis

When comparing the CF% and FF% player statistic together, both are understandably highly correlated with each other, given that they are the same calculation, with Fenwick only excluding a type of shot.

When the team's CF%?FF% is compared to the final outcome of a game, however, CF% and FF% show a small, but important difference. CF% is just slightly inversely proportional with the win odds, with the average CF% of a team being higher when they lose than in a win. With the FF%, the average is even no matter if the game is a win or lose, and the 1st and 2nd quartiles are higher for winning games than losing games.

This means, for the purpose of concluding if a team can win a game against another team, FF% is a slightly better stat to compare than CF%, though the distinct value difference between the teams is no good indicator of a win. This can be due to a number of reasons, like a difference in accuracy and precision of a team's forwards, the difference in defensive game of either team, or even simple puck luck.

Factors of Offensive Pressure
===

Column {.tabset .tabset-fade data-width=650}
------------------------------

### Fenwick

```{r fenpress}
ggplot(sktr22_gp, aes(x = onIce_fenwickPercentage, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "navy") +
  xlab("On-Ice Fenwick %") +
  ylab("Points per Play Hour") +
  ylim(0, 4)
```

### Corsi

```{r corsipress}
ggplot(sktr22_gp, aes(x = onIce_corsiPercentage, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "darkred") +
  xlab("On-Ice Corsi %") +
  ylab("Points per Play Hour") +
  ylim(0, 4) +
  xlim(0, 1)
```

### Shots on Goal

```{r sog}
ggplot(sktr22_gp, aes(x = I_F_shotsOnGoal, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "navy") +
  geom_smooth(color = "darkred", se = FALSE) +
  xlab("Individual's Shots On Goal") +
  ylab("Points per Play Hour") +
  ylim(0, 4)
```

### Shot Attempts

```{r sa}
ggplot(sktr22_gp, aes(x = I_F_shotAttempts, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "navy") +
  geom_smooth(color = "darkred", se = FALSE) +
  xlab("Individual's Shot Attempts") +
  ylab("Points per Play Hour") +
  ylim(0, 4)

```

### SA Difference

```{r sadiff}
tmgm_hm_w <- subset(tmgm_hm, tmgm_hm$final == "win")
tmgm_hm_l <- subset(tmgm_hm, tmgm_hm$final == "loss")

ggplot(tmgm_hm, aes(x = shotAttemptsDiff, fill = final)) +
  geom_histogram(binwidth = 1) +
  geom_vline(data = tmgm_hm, aes(xintercept = mean(tmgm_hm_w$shotAttemptsDiff)), color = "blue") +
  geom_vline(data = tmgm_hm, aes(xintercept = mean(tmgm_hm_l$shotAttemptsDiff)), color = "red") +
  xlab("Difference in Shot Attempts (by Home Team)") +
  labs(fill = "Game Outcome")

```

### Takeaways

```{r takeaway}
ggplot(sktr22_gp, aes(x = I_F_takeaways, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "navy") +
  geom_smooth(color = "darkred", se = FALSE) +
  xlab("Individual's Takeaways") +
  ylab("Points per Play Hour") +
  ylim(0, 4)
```

### Hits

```{r hits}
ggplot(sktr22_gp, aes(x = I_F_hits, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "navy") +
  geom_smooth(color = "darkred", se = FALSE) +
  xlab("Individual's Hits Made") +
  ylab("Points per Play Hour") +
  ylim(0, 4)
```

### Face-offs

```{r faceoffs}
ggplot(sktr22_gp, aes(x = I_F_faceOffsWon, y = `P/60`)) +
  geom_point(alpha = 0.25, color = "navy") +
  xlab("Individual's Face Offs Won") +
  ylab("Points per Play Hour") +
  ylim(0, 4)
```


Column {.tabset .tabset-fade data-width=350}
------------------------------

### Why Offensive Pressure?

Offensive Pressure is no single statistic, but a value determined by the overall performance of a shift in a game. Unlike goals scored or shots attempted, the offensive pressure of a shift can be good, despite a lack of goals or shots to show for it. But, offensive pressure can prevent the possession of the puck by the opponent, additional stress that may lead to more mistakes by the opponent, and more.

So what statistic is used for offensive pressure? In hockey, the points per play hour, or P/60 statistic is considered to best represent the pressure a player puts in the rink when on the ice. This is because more activity, more possession, and more opportunities means more possible points made.

### Analysis

Most of these statistics were expected to increase P/60 due to the nature of more opportunities of a player to do both the stat and gain more P/60. However, the difference in shot attempts between the home and away teams were actually inversely proportional to the game's outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.

These results make sense, both because opportunity time for one stat means opportunity time for another, and that more hits a player makes means more possible penalties from those hits, and thus less play time, and no pressure being added on the ice. The shot attempts differential result, however, is against basic thought that shooting more means more chances for goals, and thus more points. the slight inverse proportionality may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.

##### On Fenwick

Fenwick certainly correlates with P/60, as a higher percentage means more shot attempts in general, but the amount of variation is severe, and the correlation does not have a major effect on P/60. This tracks given Fenwick's record with winning odds and predictive abilities. Shot Attempts, no matter the type, does mean giving the puck away on a shot, and the variation of goals and misses means the points a player can accrue is limited within the statistic.

##### On Corsi

Corsi is in essentially the same boat as Fenwick in correlating with P/60, with no major differences. In fact, it has the same issue as Fenwick does in correlating with P/60 and, as such, offensive pressure.

##### On Shots On Goal

SOG has a good correlation with P/60, as shots on goal are shots more likely to go in the net given that only the defense is in the way of it being a goal, and thus SOG tend to be goals more often.

##### On Shot Attempts

Shot Attempts, like SOG, have a similar correlation with P/60. This, like the Corsi/Fenwick relation, is for the same reasons as SOG.

##### On Takeaways

Takeaways have a positive but slowly degrading correlation with P/60, with a significant amount of takeaways leading to diminishing returns in P/60. In terms of offensive pressure, no stat could be reasoned to be more offensive than taking possession of the puck away from the opposing team. More takeaways means more pressure, and the graph reflects that.

##### On Hits

Hits have a surprising result, given the importance hitting has in the sport culturally. Fewer hits correlate with more P/60, with more hits leading to a slightly lower P/60. Game-wise, more hits lead to more penalties, and that cuts opportunities for points to be made, and removes the player from adding offensive pressure from the ice.

##### On Face Offs WOn

Face offs won have a slight correlation with P/60, but cannot compare to other stats given the huge range of data on the left of the graph where the defense positions play. So despite the slight correlation with P/60 and offensive pressure, as won face offs means more opportunities for puck possession, the relation is too small to have a real effect in the game, and the season as a whole.

Conclusion & Background
===

Column {data-width=550}
-------------------------

### Applications in Hockey

##### Shot Positions

Understanding the common locations of puck shots allows teams to strategize their defense around defending against those shots, especially for the goalie watching for where and when the puck may be shot.

For the offense of a team, the location density can be compared with their own data, leading to different strategies of where to force a puck down the line, and how to set up a shot against their opponents.

##### Using Fenwick

Fenwick is a better stat for predicting success by both players and teams in competing during a season over Corsi. The average rate of wins correlate more with Fenwick than with Corsi.

##### The Important Stats in a Game

The difference in shot attempts between the home and away teams were actually inversely proportional to the game's outcome by a slight margin. Additionally, Hits had no correlation with the points a player made, and fewer hits meant a slight increase in points.

The shot attempts differential however, has a slight inverse proportionality that may show the aftermath of most shot attempts: that the opponents often gain possession of the puck and offensive pressure turns towards the subject team. Thus using the difference in shot attempts during a game cannot be used to predict the outcome of said game.

Other stats, like Takeaways, SOG, Fenwick/Corsi, and Shot Attempts had a positive correlation with a player's P/60, and therefore offensive pressure.

##### Limitations and Future Work

The data collected was across the 2022-2023 season for individual and team stats. The available graphical analysis may not have exposed additional correlations between stats, or other advanced stats may have an unexpected effect on the outcome of a game, or season.

Assumptions like the situation always being 5v5 were important in understanding the effect of certain stats in equal play, but uneven situations may lead to some stats having more or less of an impact on a game.

For the Future, larger studies on all situations and available seasonal data could lead to a more comprehensive understanding of the questions asked.

Column {data-width=450}
-------------------------

### The Author

[Stephen Boerger's LinkedIn](https://www.linkedin.com/in/stephen-boerger-04104b294/)

Stephen Boerger created this Data Analysis Project for the Analytics Class Final, and as such presents his skills as of the Fall of 2023. Hockey was chosen because of his recent entry into being a fan of the sport, and discovering what stats affect what parts of the game and the season has helped teach him more about the sport he is growing to love.

### Resource Citations

[MoneyPuck.com Hockey Data](https://moneypuck.com/data.htm)

[Wikipedia NHL Arenas](https://en.wikipedia.org/wiki/List_of_National_Hockey_League_arenas)

[NHL Open API Data](https://www.nhl.com/stats/skaters?report=scoringRates&reportType=season&seasonFrom=20222023&seasonTo=20222023&gameType=2&filter=gamesPlayed,gte,1&sort=pointsPer605v5,goalsPer605v5&page=9&pageSize=100)

[Map data for Canada](https://github.com/joellecayen/canadianmaps/blob/main/R/canadianmaps.R)

[2014 Study on Success in the NHL by Weissbock](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.841.8005&rep=rep1&type=pdf)