Facebook
TwitterThe Hockey Database is a collection of historical statistics from men's professional hockey teams in North America.
Note that as of v1, this dataset is missing a few files, due to Kaggle restrictions on the number of individual files that can be uploaded. The missing files will be noted in the description below.
The dataset contains the following tables (all are csv):
Descriptions of the individual fields in each file can be found in the file's description.
The Hockey Databank project allows for free usage of its data, including the production of a commercial product based upon the data, subject to the terms outlined below.
1) In exchange for any usage of data, in whole or in part, you agree to display the following statement prominently and in its entirety on your end product:
"The information used herein was obtained free of charge from and is copyrighted by the Hockey Databank project. For more information about the Hockey Databank project please visit http://sports.groups.yahoo.com/group/hockey-databank"
2) Your usage of the data constitutes your acknowledgment, acceptance, and agreement that the Hockey Databank project makes no guarantees regarding the accuracy of the data supplied, and will not be held responsible for any consequences arising from the use of the information presented.
This dataset was downloaded from the hockey database at Open Source Sports. The original acknowledgments are as follows:
A variety of sources were consulted while constructing this database. These are listed below in no particular order.
Books:
Periodicals:
On-line sources:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Hockey Player (sample 2398) is a dataset for object detection tasks - it contains Hockey Players annotations for 2,398 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The term “relative age effect” (RAE) is used to describe a bias in which participation in sports (and other fields) is higher among people who were born at the beginning of the relevant selection period than would be expected from the distribution of births. In sports, RAEs may affect the psychological experience of players as well as their performance. This article presents 2 studies. Study 1 aims to verify the prevalence of RAEs in minor hockey and test its associations with players' physical self-concept and attitudes toward physical activities in general. Study 2 verifies the prevalence of the RAE and analyzes the performance of Canadian junior elite players as a function of their birth quartile. In study 1, the sample is drawn from 404 minor hockey players who have evolved from a recreational to an elite level. Physical self-concept and attitudes toward different kinds of physical activities were assessed via questionnaires. Results showed that the RAE is prevalent in minor hockey at all competition levels. Minor differences in favor of Q1-born players were observed regarding physical self-concept, but not attitudes. In study 2, data analyses were conducted from the 2018–2019 Canadian Hockey League database. Birth quartiles were compared on different components of performance by using quantile regression on each variable. Results revealed that RAEs are prevalent in the CHL, with Q1 players tending to outperform Q4 players in games played and power-play points. No other significant differences were observed regarding anthropometric measures and other performance outcomes. RAEs are still prevalent in Canadian hockey. Building up perceived competence and providing game-time exposure are examples of aspects that need to be addressed when trying to minimize RAEs in ice hockey.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains comprehensive player statistics and contract details for the 2024-25 National Hockey League (NHL) season. It merges both on-ice performance metrics and contractual information, making it valuable for fans, analysts, sports journalists, and data scientists interested in exploring hockey performance, salary cap dynamics, and advanced analytics.
Facebook
TwitterI wanted to learn how to scrape data from web pages into my R sessions to analyze things I otherwise wouldn't be able to analyze. I found an incredibly helpful tutorial on DataCamp.com, but I also decided that, in order to *really *learn it, I needed to pick my own dataset to work with. I am a huge hockey fan and I've wanted to play with some hockey data for a while, but I hadn't quite found what I was looking for here on Kaggle... so I decided to kill two birds with one stone and make this dataset.
Within, there's year-by-year skater stats from 30 leagues across the most recent 38 seasons. There's also a "dim" table for each player where I scraped their height, weight, birthdate, birthplace, and draft position (if available).
All data was gathered from EliteProspects.com using the rvest package in R. Special thanks to EliteProspects for maintaining the most complete world ice hockey database that I've seen online, the creators of rvest, and to Arvid Kingl for the incredibly helpful rvest tutorial that helped me get up and going on this project.
I'm mostly excited to build some cool visuals and models with the data. I want to answer questions like: at what age do NHL players peak? Is it different depending on what round they're drafted in? How well do we expect a player to fare in X league based on how he did in Y league the preceding season?
Facebook
TwitterHockey-player birthplaces from the "Master" table of the Hockey Databank database (August 2015 update), joined to the "Scoring" and "Goalies" tables (each summarized by playerID, for NHL/WHA players only) and then exported.Subset containing only players who played in the NHL or WHA, and time-enabled on the range of each player's first and last NHL or WHA season (see firstSeason and lastSeason fields).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset features the salaries of 874 nhl players for the 2016/2017 season. I have randomly split the players into a training (612 players) and test (262 players) populations. There are 151 predictor columns (described in column legend section, if you're not familiar with hockey the meaning of some of these may be a bit cryptic!) as well as a leading column with the players 2016/2017 annual salary. For the test population the actual salaries have been broken off into a separate .csv file.
Raw excel sheet was acquired http://www.hockeyabstract.com/
Can you build a model to predict NHL player's salaries? What are the best predictors of how much a player will make?
Acronym - Meaning
%FOT - Percentage of all on-ice faceoffs taken by this player.
+/- - Plus/minus
1G - First goals of a game
A/60 - Events Against per 60 minutes, defaults to Corsi, but can be set to another stat
A1 - First assists, primary assists
A2 - Second assists, secondary assists
BLK% - Percentage of all opposing shot attempts blocked by this player
Born - Birth date
C.Close - A player shot attempt (Corsi) differential when the game was close
C.Down - A player shot attempt (Corsi) differential when the team was trailing
C.Tied - A player shot attempt (Corsi) differential when the team was tied
C.Up - A player shot attempt (Corsi) differential when the team was in the lead
CA - Shot attempts allowed (Corsi, SAT) while this player was on the ice
Cap Hit - The player's cap hit
CBar - Crossbars hit
CF - The team's shot attempts (Corsi, SAT) while this player was on the ice
CF.QoC - A weighted average of the Corsi percentage of a player's opponents
CF.QoT - A weighted average of the Corsi percentage of a player's linemates
CHIP - Cap Hit of Injured Player is games lost to injury multiplied by cap hit per game
City - City of birth
Cntry - Country of birth
DAP - Disciplined aggression proxy, which is hits and takeaways divided by minor penalties
DFA - Dangerous Fenwick against, which is on-ice unblocked shot attempts weighted by shot quality
DFF - Dangerous Fenwick for, which is on-ice unblocked shot attempts weighted by shot quality
DFF.QoC - Quality of Competition metric based on Dangerous Fenwick, which is unblocked shot attempts weighted for shot quality
DftRd - Round in which the player was drafted
DftYr - Year drafted
Diff - Events for minus event against, defaults to Corsi, but can be set to another stat
Diff/60 - Events for minus event against, per 60 minutes, defaults to Corsi, but can be set to another stat
DPS - Defensive point shares, a catch-all stats that measures a player's defensive contributions in points in the standings
DSA - Dangerous shots allowed while this player was on the ice, which is rebounds plus rush shots
DSF - The team's dangerous shots while this player was on the ice, which is rebounds plus rush shots
DZF - Shifts this player has ended with an defensive zone faceoff
dzFOL - Faceoffs lost in the defensive zone
dzFOW - Faceoffs win in the defensive zone
dzGAPF - Team goals allowed after faceoffs taken in the defensive zone
dzGFPF - Team goals scored after faceoffs taken in the defensive zone
DZS - Shifts this player has started with an defensive zone faceoff
dzSAPF - Team shot attempts allowed after faceoffs taken in the defensive zone
dzSFPF - Team shot attempts taken after faceoffs taken in the defensive zone
E+/- - A player's expected +/-, based on his team and minutes played
ENG - Empty-net goals
Exp dzNGPF - Expected goal differential after faceoffs taken in the defensive zone, based on the number of them
Exp dzNSPF - Expected shot differential after faceoffs taken in the defensive zone, based on the number of them
Exp ozNGPF - Expected goal differential after faceoffs taken in the offensive zone, based on the number of them
Exp ozNSPF - Expected shot differential after faceoffs taken in the offensive zone, based on the number of them
F.Close - A player unblocked shot attempt (Fenwick) differential when the game was close
F.Down - A player unblocked shot attempt (Fenwick) differential when the team was trailing
F.Tied - A player unblocked shot attempt (Fenwick) differential when the team was tied
F.Up - A player unblocked shot attempt (Fenwick) differential when the team was in the lead. Not the best acronym.
F/60 - Events For per 60 minutes, defaults to Corsi, but can be set to another stat
FA - Unblocked shot attempts allowed (Fenwick, USAT) while this player was on the ice
FF - The team's unblocked shot attempts (Fenwick, USAT) while this player was on the ice
First Name -
FO% - Faceoff winning percentage
FO%vsL - Faceoff winning percentage against lefthanded opponents
FO%vsR - Faceoff winning percentage against righthanded opponents
FOL - The team's faceoff losses...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic information of study participants.
Facebook
TwitterDespite the traditional use of average values for determining physical demands, the intermittent and fluctuating nature of team sports may lead to underestimation of the most demanding scenarios. All the most demanding scenario-related investigations to date only report one maximal scenario per game, the greatest. However, the latest research on this subject has shown additional scenarios of equal or similar magnitude that most researchers have not considered. This repetition concept started a new way of describing competition and training loads; then the study aims were: first, to quantify and assess differences between playing positions in terms of the most demanding scenarios in official matches; and second, to quantify and assess the differences between playing positions in the repetition of different intensity scenarios relative to the most demanding individual scenario. We monitored nine professional rink hockey players (7 exterior and 2 interior players) in 18 competitive matches using an electronic performance tracking system. The interior players are closest to the opponent’s goal, while the exterior players are farthest from it. Peak physical demands variables included total distance (m), distance covered at >18 km·h-1 (m), the number of accelerations (≥2 m∙s-2, count) and decelerations (≤-2 m∙s-2, count) in 30 s. An average from the top three individual most demanding scenarios was used to define a reference value to quantify the distribution scenario repetition during matches. The results showed that peak demands in rink hockey are position-dependent, with more distance covered by exterior players and more accelerations performed by interior players. In addition, rink hockey matches include multiple scenario exposures that are close to the peak physical demands of a match. Using the results of this study, coaches can prepare tailored training plans for each position, focusing on distances covered or accelerations for exterior players.
Facebook
TwitterThe Kontinental Hockey League is now past its 13th season. While this is a rather modest number compared to the NHL and many other leagues, it can still provide us with enough data points to try and learn things about the league's players.
The data presented here includes 3 files, each of them containing data on all players in the KHL history. Or at least all players that the KHL website has data on.
The first one is player information - how big is he, what shoot he uses and such.
The second file contains performance statistics for every season during which a player have participated in at least one official match. The data may be divided into several parts: regular season, playoffs and off-season tournaments such as Nadezhda Cup. There are two reasons behind this design: not all teams participate in playoffs or off-season tournaments every year, and the data is stored that way on the KHL website. Moreover, for each player there is also a combined statistics for all his KHL seasons. It follows the same style.
The third file is on a level of individual matches. Every official match a player has ever played in, with the season indicated. However, there is a certain quirk in the data. The off-season matches are not considered official matches (which makes sense) and they are not included in the match statistics, yet they are present in the season statistics as a separate line. That creates situations when a few players are only present in the player information and season statistics and not in the match statistics.
All data belongs to the Kontinental Hockey League and was taken from their website, https://en.khl.ru/
All code used to collect data as well as process and (attempt to) analyse it is available on https://github.com/Dark-Hobbit/khl
At the moment, I see three main questions which this dataset might attempt to answer.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The IDMT-ISA-PUCKS dataset (IIPD) was designed to simulate the challenging acoustic analysis conditions consistent with industrial manufacturing settings. The dataset contains audio recordings of multiple games of air-hockey played with pucks of different plastic materials. Data collection was performed by equipping the air hockey table with two sE8 microphones, each recording one side of the table, as seen in the image above, while a game is played. Additionally, there are recordings where no game was being played and only background noise was recorded.
We recorded the games played with different pucks at three different noise levels: Level 1 at room volume (vol_000), Level 2 with some background noise (vol_050 = 70 CBR) and Level 3 at loud background noise (vol_100 = 80 CBR). The background noise was played over four speakers in equal distances around the table and contains human voices.
The following materials were used for the four pucks:
Puck_A is the original factory puck (material unknown)
Puck_E from the 3D printer (material: ABS, print process: FDM)
Puck_G from the 3D printer (material: PA2200, print process: SLS)
Puck_I from the 3D printer (material: PA12, print process: MJF)
For each noise level and puck material, five three-minute games were played with different pucks of the specified material. Further, each game was played with different sets of players. The recordings were made via two sE8 microphones placed in the middle of the air-hockey table (about 10 cm above the surface).
Dataset total duration: 260 minutes (1 min per file)
Sampling rate: 44.1KHz
Resolution: 32-bit
Stereo audio
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Sports Leagues Dataset (SLD) contains statistical data of the major professional sports leagues in the United States: NFL (National Football League), NBA (National Basketball Association), NHL (National Hockey League) and MLB (Major League Baseball). One collect five topics (Player Expenses, Player Salaries, Players Performance, Team Salaries, Team Valuation) of two dimensions (Finance and Performance) in different seasons (2000-2007) from three data sources (Forbes, Spotrac and Sports Reference).
Please consider citing https://doi.org/10.5281/zenodo.3256432 if you found this dataset useful:
[1] André Albino Bastos, Matheus de Oliveira Salim, Wladmir Cardoso Brandão. (2019). SLD: The Sports Leagues Dataset (Version 1.0) [Data set]. Zenodo.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset this week comes from Statistics Canada, the NHL team list endpoint, and the NHL API. The dataset was inspired by the blog Are Birth Dates Still Destiny for Canadian NHL Players? by JLaw (via https://universeodon.com/@jlaw/111522860812359901)!
In the first chapter Malcolm Gladwell’s Outliers he discusses how in Canadian Junior Hockey there is a higher likelihood for players to be born in the first quarter of the year.
Because these kids are older within their year they make all the important teams at a young age which gets them better resources for skill development and so on.
While it seems clear that more players are born in the first few months of the year, what isn’t explored is whether or not this would be expected. Maybe more people in Canada in general are born earlier in the year.
I will explore whether Gladwell’s result is expected as well as whether this is still true in today’s NHL for Canadian-born players.
Facebook
TwitterThe National Hockey League (NHL) is the top professional men’s hockey league in the world. The league records every shot players take along with contextual information about the shot such as its location, the player’s distance and angle to the goal when attempting the shot, as well as the outcome (blocked, missed, or goal). Using this information, the hockey analytics community have developed measures of shot quality known as expected goals. With this dataset, you can create your own expected goals model to predict the shot outcome given relevant features.
This dataset contains information about 160,573 shots during the 2021-2022 NHL season.
Build a logistic regression model to predict whether or not the shot will result in a goal based on the shot distance and angle.
Build a classification model to predict the outcome based on the spatial x,y coordinates of the shot.
Create a visualization displaying the joint frequency of shot locations. Do there appear to be any clear modes of frequently taken shots? Create a conditional version of this display by shot outcome. Does the distribution shape vary by shot outcome? (You can also perform a similar analysis by team).
Morse D (2023). hockeyR: Collect and Clean Hockey Stats. R package version 1.3.1, https://github.com/danmorse314/hockeyR.
Facebook
TwitterData for all skaters, goalies, lines/defensive pairings, and teams are available for the current season going back to the 2008-2009 season.
The data was last updated at 2023-06-14 05:31 Eastern Time. Data is available summarized on the season level and on a game by game level going back to 2008-2009. Season level data is below.
All historical shot data is available to download. This includes 1,717,746 shots from the 2007-2008 to 2022-2023 seasons. Data for the 2023-2024 season will also be available and updated nightly on this page. Saved shots on goal, missed shots, and goals are included. Blocked shots are not included in these datasets. There are 124 attributes for each shot, including everything from the player and goalie involved in the shot to angles, distances, what happened before the shot, and how long players had been on the ice when the shot was taken. Each shot also has model scores for its probability of being a goal (xGoals) as well as other models such as for the chance there will be a rebound after the shot, the probability the shot will miss the net, and whether the goalie will freeze the puck after the shot. The data has been collected from several sources including the NHL and ESPN. A good amount of data cleaning has also been done on the data. Arena adjusted shot coordinates and distances are also calculated in the dataset using the strategy War-On-Ice used from the method proposed by Schuckers and Curros.
There are two separate files which contain a detailed column description!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Top 20 players based on possession scores (2016/2017).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionIce hockey is a sport that has gained much attention in recent times, particularly concerning the development of young players. In the domain of youth sport development, one significant factor that must be considered is the perceived competence of players. This variable is closely linked to positive psychological outcomes and sustained practice. However, there is a lack of understanding about how other important developmental factors such as age, early sport specialization, players’ position and relative age affect players’ perceived competence. Therefore, the objective of this study is to explore the relationships between these developmental factors, perceived ice hockey competence and a global measure of perceived sport competence.MethodsData was drawn from 971 players (14.78 ± 1.61 mean age), who completed on-line questionnaires, from which we conducted path analyses involving all variables.ResultsYounger players tend to display higher perceived competence scores than older players. Additionally, players who opted to specialize earlier also reported higher perceived competence. Furthermore, forwards and defensemen had differing perceptions of their competence, which was in line with their respective roles on the ice. The study also showed relative age effects, in which players who were born earlier relative to the selection period tend to perceive themselves more advantageously in three components of perceived competence.DiscussionBased on these findings, several recommendations are proposed for coaches and decision-makers to encourage the positive development of ice hockey players. The study highlights that ice hockey-specific competencies are influenced by various factors, such as early sport specialization, relative age effect, player age, and position.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Professional Women’s Hockey League (PWHL) launched in 2024 as the premier professional women’s hockey league in North America. This dataset contains detailed player statistics for all skaters and goalies across the first two seasons — including preseason, regular season, and playoffs.
Seasons: 2024 (Season 1), 2025 (Season 2)
Players: All skaters & goalies (qualified and non-qualified)
Game Types: Preseason, Regular Season, Playoffs
pwhl_skater_stats_preseason.csv
pwhl_skater_stats_regular.csv
pwhl_skater_stats_playoffs.csv
pwhl_goalie_stats_preseason.csv
pwhl_goalie_stats_regular.csv
pwhl_goalie_stats_playoffs.csv
This dataset will be updated each season with new player statistics as they are published by the PWHL. Future updates will also standardize stat columns as the league evolves.
Performance trend analysis over time
Comparing skaters vs goalies across seasons
Building predictive models for player success or team performance
Raw data collected from player and game stats pages on www.thepwhl.com, aggregated by PlayHer.ai for analysis.
Special thanks to the Professional Women’s Hockey League for making player statistics publicly available through their official website (https://www.thepwhl.com).
This dataset was compiled and cleaned by PlayHer.ai with the goal of supporting research, analysis, and storytelling in women’s sports.
Facebook
TwitterThis dataset contains regular and "advanced" statistics for all NHL skaters from the 2004 through the 2018 season. Predictions for the 2018 Hart winner were derived from player performance up to that point (late November, 2017). This dataset has been updated with complete 2018 season stats. Please note that the 2005 season is absent as it was cancelled due to a player lockout and that the 2013 season was shorted from 82 to 48 game due to another player dispute.
Please note that this dataset does NOT contain goalies. In years which a goalie won the Hart trophy (2015, Carey Price), the Hart trophy winner indicator was awarded to the next runner up who was a player. This was done so that analysis could solely be done on skaters as goalies are evaluated by a completely independent set of statistics. This only impacts the 2015 season in this dataset.
Data from https://www.hockey-reference.com/ in particular the skater season statistics here: https://www.hockey-reference.com/leagues/NHL_2018_skaters.html and Hart MVP voting here: https://www.hockey-reference.com/awards/hart.html
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scraped from hockeyindia.altiusrt.com website. Code is available on GitHub.
For now, scraper is able scrape following data:
You can read the Github repo's README to know more about the data.
Facebook
TwitterThe Hockey Database is a collection of historical statistics from men's professional hockey teams in North America.
Note that as of v1, this dataset is missing a few files, due to Kaggle restrictions on the number of individual files that can be uploaded. The missing files will be noted in the description below.
The dataset contains the following tables (all are csv):
Descriptions of the individual fields in each file can be found in the file's description.
The Hockey Databank project allows for free usage of its data, including the production of a commercial product based upon the data, subject to the terms outlined below.
1) In exchange for any usage of data, in whole or in part, you agree to display the following statement prominently and in its entirety on your end product:
"The information used herein was obtained free of charge from and is copyrighted by the Hockey Databank project. For more information about the Hockey Databank project please visit http://sports.groups.yahoo.com/group/hockey-databank"
2) Your usage of the data constitutes your acknowledgment, acceptance, and agreement that the Hockey Databank project makes no guarantees regarding the accuracy of the data supplied, and will not be held responsible for any consequences arising from the use of the information presented.
This dataset was downloaded from the hockey database at Open Source Sports. The original acknowledgments are as follows:
A variety of sources were consulted while constructing this database. These are listed below in no particular order.
Books:
Periodicals:
On-line sources: