NHL Heatmap 2018-2019

Hockey Analytics

By Mark Dodd

Introduction

As frivolous as sports may seem, there is no denying the passion and love that sport elicits in population. It is a passion that unites people in celebration and agony; bringing people together in joy and sorrow. This passion has created an entire sub-industry: sports analytics.

Sports are a results based entertainment industry where a winner is ultimately, and quantifiably, established, and this generates a wealth of data to be analyzed. This analysis is not fully embraced by everyone who discuss, manage, or play the games - but that doesn’t mean it hasn’t had an impact. Decade old “laws” regarding sports betting have been changed; how teams pick players have been changed; how we watch and enjoy the game has changed. Sports data analysis was immortalized in the movie Moneyball, starring Brad Pitt, based on a true story of how the Oakland Athletics revolutionized baseball.

My focus for this project will be applying data analysis to the professional hockey domain. Specifically, the goal of this project is to create a heatmap visualization that can be used to gain insights on how teams and players play.

Data

Our data is a public dataset from Kaggle called the NHL Game Data (see references). The dataset was created using the NHL api which has been documented at https://gitlab.com/dword4/nhlapi. In addition to the Kaggle dataset we have polled the API directly to gather specific information we required for our analysis.

The Kaggle NHL dataset can be visualized with the following entity relationship diagram.

erdiagram

In [1]:
import pandas as pd
import numpy as np

import ipywidgets as widgets
from ipywidgets import interact

import plotly.graph_objects as go
import plotly.offline as py
py.init_notebook_mode(connected=False)

“When You Put the Puck on the Net, Good Things Happen”

  • Purpose
  • Data Wrangling
  • Results

Purpose

We wanted to investigate where players shoot from and where goals come from.

Shot Location

Where a player shots the puck from is an important strategy in hockey. There are fans and coaches who will always tell you to "Shooooooot".

But what locations offer the best locations for scoring?

Data Wrangling

For this visualization we focused on a single season of data, but it involved a significant amount of data wrangling. We had to perform the following:

  • Draw an NHL rink
  • Get all shots and goals and their locations
  • Merge with teams table to bring in team information
  • Merge with player event table to allow us to merge with the player table to import player data
Create the NHL Rink

The following code cell will create a function that returns a list of plotly shapes to use as one half of the NHL ice surface.

In the code we "flip" the shot and goal positions around center ice so that all of the data is concentrated on half the ice surface.

Data Wrangling - NHL Shooting and Scoring Data

We will focus on the 2018 season and playoffs and make the assumption that the distributions will be similar for other seasons.

In [10]:
# set up filenames
path = "../data/"

teams = pd.read_csv(path + "team_info.csv")
game_plays = pd.read_csv(path + "game_plays.csv")
game_player = pd.read_csv(path + "game_plays_players.csv")
player_info = pd.read_csv(path + "player_info.csv")
In [11]:
# filter for 2018 regular season games
# game id is of format ssss-tt-nnnn where:
#     ssss = first year of season (ie. 2018 for 2018-2019 season)
#     tt = two digits for type of game (02 for regular season, 03 for playoffs)
#     nnnn = four digits for the game number as there are 31 * 82 / 2 = 1271 regular season games + playoffs
#
# also take this opportunity to just filter for shots and goals
f = ((game_plays.game_id >= 2018_00_0000) & 
     (game_plays.game_id < 2019_00_0000) & 
     (game_plays.event.isin(['Shot', 'Missed Shot', 'Goal'])))
plays_2018 = game_plays[f].copy().reset_index()

# filter the game_player dataframe for 2018 season
f = ((game_player.game_id >= 2018_00_0000) & 
     (game_player.game_id < 2019_00_0000))
game_player_2018 = game_player[f].copy().reset_index()

Data Wrangling - Cleaning

As mentioned in the introduction to this section, we needed to do a lot of cleaning to build a dataframe that is in the right form for the intended visualization. We will be using four tables that will need to be merged and cleaned.

In [12]:
# break up the game id into season, game type, and game number
plays_2018['season'] = plays_2018.game_id // 1_000_000
plays_2018['game_type'] = plays_2018.game_id // 10_000 - plays_2018['season'] * 100
plays_2018['game_num'] = plays_2018.game_id - (plays_2018.season*100 + plays_2018.game_type) * 10_000

# get the player id for the "for event"
p_filter = game_player_2018.playerType.isin(['Shooter', 'Scorer'])
plays_2018 = pd.merge(left = plays_2018, 
                      right = game_player_2018[p_filter][['play_id', 'player_id']], 
                      on = 'play_id')

# convert the player id into a player name
plays_2018 = pd.merge(left = plays_2018,
                      right = player_info[['player_id', 'firstName', 'lastName', 'primaryPosition']],
                      on = 'player_id')
plays_2018['fullName'] = plays_2018['lastName'] + ', ' + plays_2018['firstName']

# a small function to convert a team id into a team name and drop the id column from the dataframe
def id_to_team_name(df, teams, df_id, team_type):
    new = pd.merge(left = df, 
                   right = teams[['team_id', 'teamName']], 
                   left_on = [df_id], 
                   right_on=['team_id'])
    new = new.drop(columns = ['team_id', df_id])
    new = new.rename(columns={'teamName': team_type})

    return new

# replace id's with team names
team_ids = ['team_id_for', 'team_id_against']
team_names = ['team_for', 'team_against']
for team_id, team_name in zip(team_ids, team_names):
    plays_2018 = id_to_team_name(plays_2018, teams, team_id, team_name)

#convert columns to categorical to make them more efficient
cat_cols = ['event', 'periodType', 'rink_side', 'primaryPosition']
for c in cat_cols:
    plays_2018[c] = plays_2018[c].astype('category')

# rescale the x, y coordinates into plotly coordinates
#   NHL data oriented so that X is the long direction, and Y is across the ice
#      -100 <= x <= 100
#      -42 <= y <= 42
#
#   the rink we build in plotly is oriented so that y is the long direction and x is across the ice
#      -250 <= x <= 250
#      -0 <= y <= 580 (we are only doing half the ice)
#
# take the absolute value to flip everything to the same side
plays_2018['py_y'] = plays_2018['st_x'].abs() * 580 / 100 
plays_2018['py_x'] = (-plays_2018['st_y'] + 42) * 500/84 - 250
plays_2018['jitter_y'] = plays_2018['py_y'] + np.random.normal(0, 2/3)
plays_2018['jitter_x'] = plays_2018['py_x'] + np.random.normal(0, 2/3)

cols_to_drop = ['x', 'y', 'st_x', 'st_y', 'player_id', 'play_id', 'game_id',
                'firstName', 'lastName', 'goals_away', 'goals_home',
                'secondaryType', 'play_num', 'periodTime', 'periodTimeRemaining', 
                'dateTime', 'description', 'rink_side']
plays_2018 = plays_2018.drop(columns = cols_to_drop)

Draw The NHL Rink

The following code will create some functions that will be used to draw an NHL Rink in plotly. The final function will be used to build the layout in plotly.

In [2]:
def draw_shape(shape_, p1, p2, width = 1, color = None, fill = None):
        
    shape = dict(
        type = shape_, xref = 'x', yref = 'y',
        x0 = str(p1[0]), y0 = str(p1[1]),
        x1 = str(p2[0]), y1 = str(p2[1]),
        line = dict(
            width = width
        ))
    
    if color is not None:
        shape['line']['color'] = color
    
    if fill is not None:
        shape['fillcolor'] = fill
    
    return shape



def draw_arc(m1, m2, c1, c2, c3, c4, c5, c6, width = 1, color = None):
    # first we convert arguments into the path string
    m = " ".join(["M", str(m1), str(m2)])
    c = " ".join(['C', str(c1), str(c2) + ',', 
                       str(c3), str(c4) + ',',
                       str(c5), str(c6)])
    p = " ".join([m, c])
    shape = dict(
        type = 'path', xref = 'x', yref = 'y',
        path = p,
        line=dict(
            width = width
        ))
    
    return shape

def draw_nhl_rink():
    
    # colors constants to reduce code late
    _RED = 'rgba(255, 0, 0, 1)'
    _BLUE = 'rgba(0, 0, 255, 1)'
    _FACEOFF = 'rgba(10, 10, 100, 1)'
    
    # build a dictionary to store our rink shapes
    nhl_rink = {}
    nhl_rink["outer_rink"] = draw_shape('rect', (-250, 0), (250, 516.2))
    nhl_rink["outer_line"] = draw_shape('line', (200, 580), (-200, 580))
    nhl_rink["center_line"] = draw_shape('line', (-250, 0), (250, 0), color = _RED) 
    nhl_rink["end_line"] = draw_shape('line', (-250, 516.2), (250, 516.2), color = _RED) 
    nhl_rink['blue_line'] = draw_shape('rect', (250, 150.8), (-250, 156.8), color = _BLUE, fill = _BLUE)
    nhl_rink['center_dot'] = draw_shape('circle', (2.94, 2.8), (-2.94, -2.8), color = _BLUE, fill = _BLUE)
    nhl_rink['center_circle'] = draw_shape('circle', (88.2, 87), (-88.2, -87), color = _BLUE)
    nhl_rink['offside_dot1'] = draw_shape('circle', (135.5, 121.8), (123.5, 110.2), color = _RED, fill = _RED)
    nhl_rink['offside_dot2'] = draw_shape('circle', (-135.5, 121.8), (-123.5, 110.2), color = _RED, fill = _RED)
    nhl_rink['zone_dot1'] = draw_shape('circle', (135.5, 406), (123.5, 394.4), color = _RED, fill = _RED)
    nhl_rink['zone_dot2'] = draw_shape('circle', (-135.5, 406), (-123.5, 394.4), color = _RED, fill = _RED)
    nhl_rink['zone_circle1'] = draw_shape('circle', (217.6, 487.2), (41.2, 313.2), color = _RED)
    nhl_rink['zone_circle2'] = draw_shape('circle', (-217.6, 487.2), (-41.2, 313.2), color = _RED)
    nhl_rink['zone1_line1'] = draw_shape('line', (30.04, 416.4), (41.8, 416.4), color = _RED)                             
    nhl_rink['zone1_line2'] = draw_shape('line', (30.04, 384), (41.8, 384), color = _RED)
    nhl_rink['zone1_line3'] = draw_shape('line', (228.76, 416.4), (217, 416.4), color = _RED)                             
    nhl_rink['zone1_line4'] = draw_shape('line', (228.76, 384), (217, 384), color = _RED)
    nhl_rink['zone2_line1'] = draw_shape('line', (-30.04, 416.4), (-41.8, 416.4), color = _RED)                             
    nhl_rink['zone2_line2'] = draw_shape('line', (-30.04, 384), (-41.8, 384), color = _RED)
    nhl_rink['zone2_line3'] = draw_shape('line', (-228.76, 416.4), (-217, 416.4), color = _RED)                             
    nhl_rink['zone2_line4'] = draw_shape('line', (-228.76, 384), (-217, 384), color = _RED)
    nhl_rink['faceoff1_line1'] = draw_shape('line', (141.17, 423.4), (141.17, 377), color = _FACEOFF)
    nhl_rink['faceoff1_line2'] = draw_shape('line', (117.62, 423.4), (117.62, 377), color = _FACEOFF)
    nhl_rink['faceoff1_line3'] = draw_shape('line', (153, 406), (105.8, 406), color = _FACEOFF)
    nhl_rink['faceoff1_line4'] = draw_shape('line', (153, 394.4), (105.8, 394.4), color = _FACEOFF)
    nhl_rink['faceoff2_line1'] = draw_shape('line', (-141.17, 423.4), (-141.17, 377), color = _FACEOFF)
    nhl_rink['faceoff2_line2'] = draw_shape('line', (-117.62, 423.4), (-117.62, 377), color = _FACEOFF)
    nhl_rink['faceoff2_line3'] = draw_shape('line', (-153, 406), (-105.8, 406), color = _FACEOFF)
    nhl_rink['faceoff2_line4'] = draw_shape('line', (-153, 394.4), (-105.8, 394.4), color = _FACEOFF)
    nhl_rink['goal_line1'] = draw_shape('line', (64.7, 516.2), (82.3, 580))
    nhl_rink['goal_line2'] = draw_shape('line', (23.5, 516.2), (23.5, 493))
    nhl_rink['goal_line3'] = draw_shape('line', (-64.7, 516.2), (-82.3, 580))
    nhl_rink['goal_line4'] = draw_shape('line', (-23.5, 516.2), (-23.5, 493))
    nhl_rink['outer_arc1'] = draw_arc(200, 580, 217, 574, 247, 532, 250, 516.2)
    nhl_rink['outer_arc2'] = draw_arc(-200, 580, -217, 574, -247, 532, -250, 516.2)
    nhl_rink['goal_arc1'] = draw_arc(23.5, 493, 20, 480, -20, 480, -23.5, 493)
    nhl_rink['goal_arc2'] = draw_arc(17.6, 516.2, 15, 530, -15, 530, -17.6, 516.2)

    # convert rink shapes dictionary to a list of shapes to use with plotly
    rink_shapes = [nhl_rink[key] for key in nhl_rink]
    
    return rink_shapes

Plot the Visualization

We will build a heatmap for shots and goals from the above dataframe.

In [13]:
# empty dataframe of x and y coordiantes to be used when we want the scatterplot to be empty
empty = pd.DataFrame({'py_x': [0], 'py_y': [0]})

# default heatamp trace
heatmap_trace = go.Histogram2dContour(
    x = empty['py_x'],
    y = empty['py_y'],
    hoverinfo = 'skip',
    name = 'density', ncontours = 3,
    colorscale = 'Hot', reversescale = True, showscale = False,
    contours = dict(coloring='heatmap'),
)

# scatterplot
shot_goal_trace = go.Scatter(
    x = empty['py_x'],
    y = empty['py_y'],
    mode = 'markers',
    name = 'goals',
    marker = dict(
        size = 6,
        color = 'blue'
    )
)

# build the layout for plotly
layout = go.Layout(
    title='NHL Rink',
    showlegend=True,
    xaxis=dict(
        showgrid=False,
        range=[-300, 300],
        showticklabels = False
    ),
    yaxis=dict(
        showgrid=False,
        range=[-100, 600],
        showticklabels = False
    ),
    shapes = draw_nhl_rink(), # build the NHL Rink
    plot_bgcolor = 'rgba(0,0,0,0)',
    height = 700, 
    width = 600 
)

# create a plotly widget
fig_game = go.FigureWidget(data = [heatmap_trace, shot_goal_trace], layout = layout)

# helper function to filter dataframe based on the state of the input widgets
def shot_goals_df(display_type, season_filter):
    goal_filter = plays_2018['event'] == 'Goal'
    shot_filter = ~goal_filter
    
    if display_type == '':
        df = empty
    elif display_type == 'Shots':
        df = plays_2018[season_filter & shot_filter]
    elif display_type == 'Goals':
        df = plays_2018[season_filter & goal_filter]
    return df

# primary eventhandler function
def update_heat(game_type, display_type, team = 'Flames', player = 'McDavid, Connor'):
    season_filter = plays_2018['game_type'] == game_type
    goal_filter = plays_2018['event'] == 'Goal'
    shot_filter = ~goal_filter
    
    if radio_mode.value == 'Season':
        df = shot_goals_df(display_type, season_filter)
        fig_game.data[0].x = df['py_x']
        fig_game.data[0].y = df['py_y']
        fig_game.data[1].x = empty['py_x']
        fig_game.data[1].y = empty['py_y']
    elif radio_mode.value == 'Game':
        if game_type == 2:
            low = 1
            high = 1272
        elif game_type == 3:
            low = plays_2018[plays_2018['game_type']==3]['game_num'].min()
            high = plays_2018[plays_2018['game_type']==3]['game_num'].max() + 1
        game_filter = (plays_2018['game_num'] == np.random.randint(low = low, high = high))
        df1 = plays_2018[game_filter & shot_filter]
        df2 = plays_2018[game_filter & goal_filter]
        fig_game.data[0].x = df1['py_x']
        fig_game.data[0].y = df1['py_y']
        fig_game.data[1].x = df2['jitter_x']
        fig_game.data[1].y = df2['jitter_y']
    elif radio_mode.value == 'Team':
        # force it to look at regular season so we don't need to check if a team was in the playoffs or not
        if display_type != '':
            season_filter = plays_2018['game_type'] == 2
            df = shot_goals_df(display_type, season_filter)
            df = df[df['team_for'] == team]
            fig_game.data[0].x = df['py_x']
            fig_game.data[0].y = df['py_y']
            fig_game.data[1].x = empty['py_x']
            fig_game.data[1].y = empty['py_y']
    elif radio_mode.value == 'Player':
        # force it to look at regular season so we don't need to check if a team was in the playoffs or not
        season_filter = plays_2018['game_type'] == 2
        df = plays_2018[season_filter & (plays_2018.fullName == player)]
        display(df)
        fig_game.data[0].x = df[shot_filter]['py_x']
        fig_game.data[0].y = df[shot_filter]['py_y']
        fig_game.data[1].x = df[goal_filter]['jitter_x']
        fig_game.data[1].y = df[goal_filter]['jitter_y']
        
#############################################
# Create the widgets
#############################################
dropdown_goals = widgets.Dropdown(
    options = ["", "Goals", "Shots"],
    value = "",
    description = 'Display:',
)

teams = list(plays_2018.team_for.sort_values().unique())
teams.insert(0, "")

dropdown_teams = widgets.Dropdown(
    options = teams,
    value = "",
    description = 'Team:',
)

player_list = list(plays_2018.fullName.sort_values().unique())
player_list.insert(0, "")
dropdown_players = widgets.Dropdown(
    options = player_list,
    value = "",
    description = 'Player:',
)

radio_mode = widgets.RadioButtons(
    options = ['Game', 'Season', 'Team', 'Player'],
    description = 'Mode:',
    disabled = False
)

radio_playoff = widgets.RadioButtons(
    options = [('Regular', 2), ('Playoff', 3)],
    description = 'Game Type:',
    disabled = False
)

#############################################
# Define eventhandlers
#############################################
def dropdown_goals_eventhandler(change):
    update_heat(radio_playoff.value, change.new)

def dropdown_teams_eventhandler(change):
    update_heat(radio_playoff.value, dropdown_goals.value, change.new)

def dropdown_players_eventhandler(change):
    update_heat(radio_playoff.value, dropdown_goals.value, player = change.new)

def radio_playoff_eventhandler(change):
    update_heat(change.new, dropdown_goals.value)

def radio_mode_eventhandler(change):
    if change.new == 'Game':
        dropdown_goals.layout.visibility = 'hidden'
        dropdown_teams.layout.visibility = 'hidden'
        dropdown_players.layout.visibility = 'hidden'
        radio_playoff.layout.visibility = 'visible'
        update_heat(radio_playoff.value, dropdown_goals.value)
    elif change.new == 'Season':
        dropdown_goals.layout.visibility = 'visible'
        dropdown_teams.layout.visibility = 'hidden'
        dropdown_players.layout.visibility = 'hidden'
        radio_playoff.layout.visibility = 'visible'
    elif change.new == 'Team':
        dropdown_teams.layout.visibility = 'visible'
        dropdown_goals.layout.visibility = 'visible'
        dropdown_players.layout.visibility = 'hidden'
        radio_playoff.layout.visibility = 'hidden'
    elif change.new == 'Player':
        dropdown_teams.layout.visibility = 'hidden'
        dropdown_players.layout.visibility = 'visible'
        dropdown_goals.layout.visibility = 'hidden'
        radio_playoff.layout.visibility = 'hidden'
        
#############################################
# Register event handlers with widgets
#############################################
radio_mode.observe(radio_mode_eventhandler, names = 'value')
radio_playoff.observe(radio_playoff_eventhandler, names = 'value')
dropdown_goals.observe(dropdown_goals_eventhandler, names = 'value')
dropdown_teams.observe(dropdown_teams_eventhandler, names = 'value')
dropdown_players.observe(dropdown_players_eventhandler, names = 'value')
In [14]:
# display widgets
display(radio_mode)
display(radio_playoff)
display(dropdown_goals)
display(dropdown_teams)
display(dropdown_players)
fig_game

Conclusions

We had certain intuitions about where teams / players would shoot and score from and it was extremely interesting to see our intuitions realized with respect to shooting and scoring. It was especially interesting to visualize the individual playing styles of specific players.

The next steps for this project may include additional modifications to the heat map so that teams could filter for their opposition to gain insight in how the other team plays and what defensive strategies would be best to counter them.

REFERENCES

Ellis, M. (2019, June). NHL Game Data, Version 4 [Online]. Available at: https://www.kaggle.com/martinellis/nhl-game-data (Retrieved September 26, 2019)

REFERENCES for Python and libraries used:

Wes McKinney. Data Structures for Statistical Computing in Python, Proceedings of the 9th Python in Science Conference, 51-56 (2010) (pandas)

McKinney, W. (2017). Python for Data Analysis. Sebastopol: O'Reilly. (Pandas)

Plotly Technologies Inc, (2015). Plotly Python Open Source Graphing Library [Online] Available at: https://plot.ly/python/ (Accessed: 10 October 2019) (Plotly)

Pravendra (2016) 'NHL Shots Analysis Using Plotly Shapes', modern data, 24 November. Available at: https://moderndata.plot.ly/nhl-shots-analysis-using-plotly-shapes/ (Access: 8 October 2019) (NHL Rink)

Project Jupyter Revision (2017). ipywidgets User Guide [Online] Available at: https://ipywidgets.readthedocs.io/en/latest/ (Accessed: 10 October 2019) (ipywidgets)