Facebook
TwitterThe lack of publicly available National Football League (NFL) data sources has been a major obstacle in the creation of modern, reproducible research in football analytics. While clean play-by-play data is available via open-source software packages in other sports (e.g. nhlscrapr for hockey; PitchF/x data in baseball; the Basketball Reference for basketball), the equivalent datasets are not freely available for researchers interested in the statistical analysis of the NFL. To solve this issue, a group of Carnegie Mellon University statistical researchers including Maksim Horowitz, Ron Yurko, and Sam Ventura, built and released nflscrapR an R package which uses an API maintained by the NFL to scrape, clean, parse, and output clean datasets at the individual play, player, game, and season levels. Using the data outputted by the package, the trio went on to develop reproducible methods for building expected point and win probability models for the NFL. The outputs of these models are included in this dataset and can be accessed using the nflscrapR package.
The dataset made available on Kaggle contains all the regular season plays from the 2009-2016 NFL seasons. The dataset has 356,768 rows and 100 columns. Each play is broken down into great detail containing information on: game situation, players involved, results, and advanced metrics such as expected point and win probability values. Detailed information about the dataset can be found at the following web page, along with more NFL data: https://github.com/ryurko/nflscrapR-data.
This dataset was compiled by Ron Yurko, Sam Ventura, and myself. Special shout-out to Ron for improving our current expected points and win probability models and compiling this dataset. All three of us are proud founders of the Carnegie Mellon Sports Analytics Club.
This dataset is meant to both grow and bring together the community of sports analytics by providing clean and easily accessible NFL data that has never been availabe on this scale for free.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
American Football Player Detection is a dataset for object detection tasks - it contains American Football Players annotations for 171 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data on athletes' professional career lengths in the sports of baseball, basketball, and American football. The data was compiled from baseball-reference.com, pro-football-reference.com, and basketball-reference.com. The data is split into three different files, one for each sport, identified by the title: baseball_career_length.csv, basketball_career_length.csv, football_career_length.csv.
Dataset Features available in all files:
- name: The name of the athlete.
- start_year: The year that the athletes professional career started.
- end_year: The last year of the athletes professional career.
- hall_of_fame: True for athletes who have been admitted to the hall of fame, False otherwise.
- status: True if the athlete has finished their career, False otherwise.
- career_length: The total number of years the athlete was actively playing professionally.
- sport: The sport of the athlete.
Additional Dataset Features available for football_career_length.csv:
- position: The position that the athlete played in their sport. If they played multiple positions they are separated by a '-'.
Additional Dataset Features available for basketball_career_length.csv:
- position: The position that the athlete played in their sport. If they played multiple positions they are separated by a '-'.
- height: The height of the athlete in inches.
- weight: The weight of the athlete in pounds.
- birth_date: The date of the athlete's birth.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dive into the ultimate treasure trove for football enthusiasts, data analysts, and gaming aficionados! The Football Manager Players Dataset is a comprehensive collection of player data extracted from a popular football management simulation game, offering an unparalleled look into the virtual world of football talent. This dataset includes detailed attributes for thousands of players across multiple leagues worldwide, making it a goldmine for analyzing player profiles, scouting virtual stars, and building predictive models for football strategies.
Whether you're a data scientist exploring sports analytics, a football fan curious about your favorite virtual players, or a game developer seeking inspiration, this dataset is your ticket to unlocking endless possibilities!
This dataset is a meticulously curated compilation of player statistics from five CSV files, merged into a single, unified dataset (merged_players.csv). It captures a diverse range of attributes for players from various clubs, nations, and leagues, including top-tier competitions like the English Premier Division, Argentina's Premier Division, and lower divisions across the globe.
merged_players.csv (UTF-8 encoded for compatibility with special characters).merged_players.csv and load it into your favorite tool (Python/pandas, R, Excel, etc.).Transfer Value, Position, and Media Description to start your analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Title: 70 Football Leagues Data (2019-2023)
Dataset Description: This dataset provides comprehensive data on 70 football leagues from various countries around the world. The dataset covers the period from 2019 to 2023, offering a rich collection of football-related information for data analysis, research, and visualization purposes.
Content: The dataset contains a wealth of football-related data, including match statistics, team information, player details, and league standings. The dataset covers a diverse range of leagues, encompassing top-tier competitions as well as lower divisions, allowing users to explore football data at various levels.
Key Features:
Match Results Home Goals Away Goals Home Goals in First Half Away Goals in First Half Match Odds for 1X2 and O/U 2.5 Goals Total Goals in the Match
Potential Use Cases: - Statistical Analysis: Analyze match data, team performance, and player statistics to identify trends, patterns, and insights. - Predictive Modeling: Utilize historical data to build predictive models for match outcomes, goal predictions, or player performance. - Visualizations: Create visualizations, graphs, and charts to present key football data in an easily understandable format.
Data Source: The data for this dataset is collected from reliable sources, including official football websites, sports news portals, and reputable football data providers. The dataset is carefully curated and quality-checked to ensure accuracy and reliability.
Updates and Maintenance: The dataset will be periodically updated to include new seasons, leagues, and any necessary data corrections. User feedback and contributions are welcome to improve the dataset and keep it up-to-date.
Disclaimer: While utmost care has been taken to ensure the accuracy and reliability of the data, errors or inconsistencies may still exist. Users are encouraged to verify the data with official sources before making any critical decisions based on the dataset.
Acknowledgments: We would like to acknowledge the contributions of the data providers, football organizations, and sports enthusiasts whose efforts have made this dataset possible. Their dedication to collecting and sharing football data is greatly appreciated.
Note: Please be respectful of the data usage policy and terms of service of the dataset. Use the data responsibly and ensure compliance with any applicable legal requirements.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Summary
QASports is the first large sports-themed question answering dataset counting over 1.5 million questions and answers about 54k preprocessed wiki pages, using as documents the wiki of 3 of the most popular sports in the world, Soccer, American Football and Basketball. Each sport can be downloaded individually as a subset, with the train, test and validation splits, or all 3 can be downloaded together.
🎲 Complete dataset: https://osf.io/n7r23/ 🔧 Processing scripts:… See the full description on the dataset page: https://huggingface.co/datasets/PedroCJardim/QASports.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Summary This dataset contains detailed information from every game listed on the NFL's official website, https://www.nfl.com/. It aims to provide a complete record of scores along with play-by-play data across all available seasons. This dataset was created with the hope of being a valuable resource for sports analysts and data scientists interested in American football statistics. The dataset was last updated on 02/10/2025.
Data Collection The data was collected using a custom web scraper, which is openly available for review and further development. You can access the scraper code and documentation at the following GitHub repository: https://github.com/KeoniM/NFL_Scraper.git
Dataset Features For Scores - Season: The NFL season the game belongs to. - Week: Specific week of the NFL season. - GameStatus: Current state or final status of the game. - Day: Day of the week the game was played. - Date: Exact date (month and day) of the game. - AwayTeam: Name of the visiting team. - AwayRecord: Season record of the away team at the time of the game. - AwayScore: Total points scored by the away team. - AwayWin: Boolean indicator if the away team won the game. - HomeTeam: Name of the home team. - HomeRecord: Season record of the home team at the time of the game. - HomeScore: Total points scored by the home team. - HomeWin: Boolean indicator if the home team won the game. - AwaySeeding: Playoff seeding of the away team, if applicable. - HomeSeeding: Playoff seeding of the home team, if applicable. - PostSeason: Boolean indicating whether the game is a postseason match.
Dataset Features For Plays - Season: The NFL season the play belongs to. - Week: Specific week of the NFL season. - Day: Day of the week the play was attempted. - Date: Exact date (month and day) of the play was attempted. - AwayTeam: Name of the visiting team. - HomeTeam: Name of the home team. - Quarter: The quarter of the game the play was attempted. - DriveNumber: The drive number of the quarter the play was attempted. - TeamWithPossession: Team with possession that attempted the play. - IsScoringDrive: Did the drive result in a score. - PlayNumberInDrive: Play number during the drive that the play was attempted. - IsScoringPlay: Did the play result in a score. - PlayOutcome: Short summary of the attempted play. - PlayDescription: In depth summary of the attempted play. - PlayStart: Starting point on the field of the attempted play.
Follow My Data Cleaning Journey If you're interested in following my process of refining and cleaning this dataset, check out my Google Colab notebook on GitHub, where I share ongoing updates and insights: https://github.com/KeoniM/NFL_Data_Cleaning.git. The notebook includes data wrangling techniques, code snippets, and continuous improvements, making this dataset even more valuable for analysis.
Usage Notes This dataset is intended for academic and research purposes. Users are encouraged to attribute data to the source https://www.nfl.com/ when employing this dataset in their projects or publications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous research has sought to quantify head impact exposure using wearable kinematic sensors. However, many sensors suffer from poor accuracy in estimating impact kinematics and count, motivating the need for additional independent impact exposure quantification for comparison. Here, we equipped seven collegiate American football players with instrumented mouthguards, and video recorded practices and games to compare video-based and sensor-based exposure rates and impact location distributions. Over 50 player-hours, we identified 271 helmet contact periods in video, while the instrumented mouthguard sensor recorded 2,032 discrete head impacts. Matching video and mouthguard real-time stamps yielded 193 video-identified helmet contact periods and 217 sensor-recorded impacts. To compare impact locations, we binned matched impacts into frontal, rear, side, oblique, and top locations based on video observations and sensor kinematics. While both video-based and sensor-based methods found similar location distributions, our best method utilizing integrated linear and angular position only correctly predicted 81 of 217 impacts. Finally, based on the activity timeline from video assessment, we also developed a new exposure metric unique to American football quantifying number of cross-verified sensor impacts per player-play. We found significantly higher exposure during games (0.35, 95% CI: 0.29–0.42) than practices (0.20, 95% CI: 0.17–0.23) (p
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I play fantasy football.
Contains fantasy data from 2016-2019
Have fun.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By FiveThirtyEight [source]
This repository contains a comprehensive database on the careers of NFL wide receivers, examining their performance over time to offer insights into physical changes and playing style over the years. With data stretching back all the way to 1990, it reveals key changes in stats and ratings -- including age_from/age_to, trypg_change, career_try/career_ranypa/career_wowy, and bcs_rating -- that provide essential information for football fans looking to understand the history and evolution of this position in American football. This dataset is made available under Creative Commons Attribution 4.0 International License as well as MIT License with hopes of facilitating more public understanding and transparency on this subject. We invite anyone who finds it useful to share their stories by contacting us at andrei.scheinkmanfivethirtyeight.com
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to get started using this dataset: - Read through the columns of data to better understand what is being measured and how it relates to an individual player's performance. - Explore the data by filtering it in different ways (such as looking at only high rated players or seeing how older players fared compared with younger ones). - See if there are any patterns in how certain traits (such as age) affect a player's performance over time by creating graphs or other visualizations that explore these relationships over time.
- Use these findings to draw your own conclusions about trends in NFL wide receiver aging curves or team strategies related to scouting opportunities for certain players throughout different stages of their career development journey from rookies all the way through veterans who are retiring from playing football professionally on any given year during an off-season year . . . or even beyond!
- Analyzing the performance of NFL wide receivers over time by comparing their age-from and age-to stats.
- Comparing the AV rating of NFL wide receivers to their total career receiving yards per attempt.
- Comparing the career wowy stats of NFL wide receivers to their total career targets in order to assess efficiency levels across different players
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: try-per-game-aging-curve.csv | Column name | Description | |:-----------------|:------------------------------------------------------------------------------------------------------------| | age_from | Age when the career started. (Integer) | | age_to | Age when the career ended. (Integer) | | trypg_change | Change in the wide receiver's total receiving yards per game from the start to end of their career. (Float) |
File: advanced-historical.csv | Column name | Description | |:------------------|:-----------------------------------------------------------| | player_name | Name of the NFL wide receiver. (String) | | career_try | Total number of career targets. (Integer) | | career_ranypa | Average number of receiving yards per attempt. (Float) | | career_wowy | Average number of yards per target. (Float) | | bcs_rating | Player's overall rating according to BCS system. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit FiveThirtyEight.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Head impacts per play exposure metric.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous exposure studies.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
[en-us] This dataset gathers detailed information on the performance of football players and their market values, collected from two widely recognized sources in the sports world: Sofascore and Transfermarkt. The dataset includes a combination of on-field performance metrics and financial data related to the players' market valuation.
The performance data is derived from Sofascore, which provides detailed statistics on player performances in various competitions, including goals, assists, completed passes, tackles, and other performance indicators. Meanwhile, the players' market value information is sourced from Transfermarkt, a leading platform that tracks market fluctuations, the highest market values reached by players, and contract expiration dates.
This dataset is ideal for analyses involving the relationship between sports performance and market value, allowing insights into how on-field performance can impact players’ market value. It is useful for sports analysts, researchers, and enthusiasts looking to explore trends in football, observe the valuation of players over time, and make comparisons between leagues and competitions.
market_value: History of the player's contract values. partidas_sofascore: Game dates, championships, and match IDs. performance_tm: Some player statistics collected from the Transfermarkt website. players_tm: Information related to the club, URL, and player ID on Transfermarkt. statistics_game: Game statistics, with total values, first and second halves. statistics_player: Individual player statistics.
The championships collected are: - Campeonato Brasileiro Série A and B - Copa do Brasil - Copa Sudamericana - Copa Libertadores
The data coming until 2024-10-12.
At this initial stage, data has been extracted from championships related to Brazil and South America. More data on other European and South American championships will be added soon.
[pt-br] Este dataset reúne informações detalhadas sobre o desempenho de jogadores de futebol e seus valores de mercado, coletados de duas fontes amplamente reconhecidas no mundo esportivo: Sofascore e Transfermarkt. O conjunto de dados inclui uma combinação de métricas de desempenho em campo e dados financeiros relacionados à avaliação de mercado dos jogadores.
Os dados de desempenho são derivados do Sofascore, que fornece estatísticas detalhadas sobre as atuações dos jogadores em diversas competições, incluindo gols, assistências, passes completos, desarmes, entre outros indicadores de performance. Já as informações sobre o valor de mercado dos jogadores são extraídas do Transfermarkt, uma plataforma líder que acompanha as flutuações de mercado, maiores valores atingidos pelos jogadores e as datas de expiração de seus contratos.
Este dataset é ideal para análises que envolvem a relação entre o desempenho esportivo e o valor de mercado, permitindo insights sobre como a performance em campo pode impactar o valor de mercado dos jogadores. É útil para profissionais de análise esportiva, pesquisadores e entusiastas que buscam explorar tendências no futebol, observar a valorização de jogadores ao longo do tempo e realizar comparações entre ligas e competições.
market_value: Histórico dos valores do contrato do jogador.; partidas_sofascore: Referente a Data dos jogos, campeonatos e ID's das Partidas; peformance_tm : Algumas estatísticas coletadas do jogador no site do Transfermakt; players_tm: informações referentes ao Clube, URL e ID do jogador no Transfermakt.; statistics_game: Estatísticas do jogo, com valores totais, primeiro e segundo tempo; statistics_player : Estatisticas individuais dos jogadores.
Os campeonatos que forma coletados são: - Campeonato Brasileiro Série A e B; - Copa do Brasil; - Campeonato Sulamericana; - Taça Libertadores da América.
Os dados vão até o dia 12/10/2024.
Nesse primeiro momento foram extraídos dados dos campeonatos referente ao Brasil e a América do Sul. Em breve será adicionado mais dados referente a outros campeonatos europeus e sulamericanos.
Facebook
TwitterThis dataset presents a comprehensive overview of all football transfers completed during the Summer 2025 transfer window, just before the FIFA Club World Cup 2025. It was scraped from Transfermarkt (https://www.transfermarkt.fr), one of the most reliable and up-to-date sources for football transfer data.
It includes over 1,200 player moves across various leagues and countries, capturing transfers involving top clubs, rising talents, and strategic loans across the globe.
📦 Dataset Summary Total transfers: 1,208
Transfer window: Summer 2025 (before FIFA Club World Cup 2025)
Source: Scraped from Transfermarkt
Coverage: Global (Europe, South America, Africa, Asia, North America...)
📑 Columns Description - name :Player’s full name - position :Playing position (e.g., Striker, Goalkeeper, Midfielder, etc.) - age :Player's age at the time of transfer - market_value :Estimated market value before the transfer (as listed by Transfermarkt) - country_from :Origin country of the club the player is leaving - league_from :Origin league - club_from :Club the player is leaving - country_to :Destination country - league_to :Destination league - club_to :Club the player is joining - fee :Transfer fee (can be free, undisclosed, or in euros) - loan :Boolean flag indicating whether the move is a loan (True/False)
📊 Use Cases Track transfer market trends across countries and leagues
Analyze market value vs. transfer fee
Explore position-based transfer patterns
Study the most active clubs or leagues
Build predictive models: Who is likely to transfer where, at what value?
Visualize global player flows during a transfer window
🧠 Example Ideas for Analysis What positions are most frequently transferred?
Which leagues spend the most per player?
How often do transfers occur between specific countries?
How many loan deals vs. permanent moves?
How do transfer fees correlate with market values by age group?
📌 Notes All data was collected manually via web scraping and cleaned using pandas.
Currency in market_value and fee may need to be parsed into numeric values for quantitative analysis.
Some entries may include "undisclosed" or "free transfer" as values for fee.
Facebook
TwitterWhen a quarterback takes a snap and drops back to pass, what happens next may seem like chaos. As offensive players move in various patterns, the defense works together to prevent successful pass completions and then to quickly tackle receivers that do catch the ball. In this year’s Kaggle competition, your goal is to use data science to better understand the schemes and players that make for a successful defense against passing plays.
In American football, there are a plethora of defensive strategies and outcomes. The National Football League (NFL) has used previous Kaggle competitions to focus on offensive plays, but as the old proverb goes, “defense wins championships.” Though metrics for analyzing quarterbacks, running backs, and wide receivers are consistently a part of public discourse, techniques for analyzing the defensive part of the game trail and lag behind. Identifying player, team, or strategic advantages on the defensive side of the ball would be a significant breakthrough for the game.
This competition uses NFL’s Next Gen Stats data, which includes the position and speed of every player on the field during each play. You’ll employ player tracking data for all drop-back pass plays from the 2018 regular season. The goal of submissions is to identify unique and impactful approaches to measure defensive performance on these plays. There are several different directions for participants to ‘tackle’ (ha)—which may require levels of football savvy, data aptitude, and creativity. As examples:
What are coverage schemes (man, zone, etc) that the defense employs? What coverage options tend to be better performing? Which players are the best at closely tracking receivers as they try to get open? Which players are the best at closing on receivers when the ball is in the air? Which players are the best at defending pass plays when the ball arrives? Is there any way to use player tracking data to predict whether or not certain penalties – for example, defensive pass interference – will be called? Who are the NFL’s best players against the pass? How does a defense react to certain types of offensive plays? Is there anything about a player – for example, their height, weight, experience, speed, or position – that can be used to predict their performance on defense? What does data tell us about defending the pass play? You are about to find out.
Note: Are you a university participant? Students have the option to participate in a college-only Competition, where you’ll work on the identical themes above. Students can opt-in for either the Open or College Competitions, but not both.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created from web scraping the football page FBREF. Includes +27,000 match reports from the Top 5 European leagues and the Top 2 South American leagues, with domestic and international cups games.
Europe - England: Premier league, EFL Cup and FA Cup (2015-2023). - France: Ligue 1, Coupe de France and Trophée des Champions (2018-2023) - Coupe de la Ligue (2018-2020). - Italy: Serie A and Coppa Italia (2015-2023) - Supercoppa Italiana (2016-2023). - Germany: Fußball-Bundesliga, DFB-Pokal and DFL-Supercup (2015-2023). - Spain: La Liga and Copa del Rey (2015-2023) - Supercopa de España (2016-2023). - International: UEFA Champions League, UEFA Europa League, UEFA Super Cup (2015-2023) - UEFA Europa Conference League (2022-2023).
South America - Argentina: Argentine Primera División (2017-2022) - Copa de la Liga Profesional (2021-2022). - Brazil: Campeonato Brasileiro Série A (2017-2022). - International: Copa Libertadores (2017-2022) - Copa Sudamericana (2016-2022).
For each game, data available includes Match Info, Team Stats, Managers, Captains, Formations, Lineups and Player stats (players with at least 1 minute played). Match reports available for domestic cups games only for rounds that include first tier teams.
By using this repository, you are agreeing to Sports Reference LLC Terms of Use.
Facebook
TwitterShoulder injuries are among the most common types of upper extremity injuries in both contact and noncontact sports. They are a significant source of morbidity for athletes, accounting for almost one third of all sports-related injuries (Enger). As a result of these factors, shoulder injuries and their post healing metrics are an important area for research in orthopaedics.
Shoulder injuries most commonly result from direct trauma or a fall onto the ipsilateral shoulder, making athletes especially prone to these types of injuries (Monica). Some of the most common injuries in this population include anterior and posterior glenohumeral instability, acromioclavicular pathology (including separation, osteolysis, and osteoarthritis) and rotator cuff tears (Gibbs). Acromioclavicular joint injuries are the most common among the athletic population with an overall incidence rate of 9.2 per 1000 person-years and an average time of 18.4 days lost per athlete (Pallis) followed by Glenohumeral instability at at 2.79 per 1000 person-years (Lanzi).
With American football being a high contact sport played at high speeds, the potential for shoulder injuries from minor sprains to career ending tears is significant. Nearly half of players at the NFL combined have reported a history of shoulder injury, with 34% requiring operative intervention (Kaplan). Quarterbacks are particularly affected by shoulder injuries due to their playing position being targeted by the opposing team on every play, and the associated throwing mechanics with their playing actions. Of all QB injuries reported, shoulder injuries are the 2nd most common at 15.2% (Kelly).
The purpose of this study was to determine (1) the general impact on performance metrics among NFL quarterbacks following shoulder injury and (2) the impact of surgical interventions to repair these injuries had on career outcomes using measures such as passer rating, yards ran, and successful passes. We hypothesized that quarterbacks in the national football league who injure their dominant shoulder will 1) have decreased performance metrics after surgery 2) those that get surgery will have better performance metrics compared to those that do not get surgery.
National Football League (NFL) Injured Reserve (IR) lists for the years 1980 to 2019 were pulled from Pro Sports Transactions and entries were queried to find quarterbacks who were placed on the IR with a shoulder injury.
50 quarterbacks were found to have long-term shoulder injuries, and a subset of these were selected who had first-time shoulder surgery on their dominant, throwing arm. Manual searches were performed to verify the nature of the injury and determine dates of surgery. Age-matched controls were selected with the following criteria:
same years of experience same number of career seasons +/- 5 same year of NFL play +/- 10
Quarterbacks (QBs) who suffered a shoulder injury necessitating placement on the Injured Reserve (IR) list were identified. Placement on the IR indicates a long-term injury rendering players unable to play in the remainder of the season. Pro Sports Transactions IR data from 1980 to 2019 was extracted and entries were filtered for injuries using keywords "shoulder", "labrum", "rotator cuff", “dislocation”, and “AC joint”. An additional manual search of news articles from the NFL, official team websites, and reputable news sources was performed to confirm surgery types and dates and obtain information about players who were placed on the IR without a description of their injury. 65 relevant injuries were found. Of these injuries, 14 were repeat injuries for the same player and 17 were injuries to the non-throwing arm, all of which were excluded. The remaining entries were excluded if the shoulder injury was characterized as a "bruise" or a "strain", and therefore not serious enough injuries to evaluate. Clavicle injuries were also excluded. Players who did not return to play in more than 1 regular season game were excluded for the performance analysis. A total of 19 QBs who received surgery and 11 QBs who suffered a severe shoulder injury but did not receive surgery were included.
QB performance statistics were extracted from Pro Football Reference, which includes statistics by game for players' entire careers. 269 QBs from 1980 to 2020 were found and used as the entire NFL population of QBs. Included performance statistics were selected to be passer rating, passing yards, pass attempts, pass completions, pass completion percentage, passing touchdowns, interceptions, sacks, yards lost to sacks, yards per pass attempt, adjusted yards per pass attempt. Performance statistics were included only if the player attempted more than 1 pass in a game, and statistics were averaged for each game.
Unique age and experience matched controls were selected...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe lack of publicly available National Football League (NFL) data sources has been a major obstacle in the creation of modern, reproducible research in football analytics. While clean play-by-play data is available via open-source software packages in other sports (e.g. nhlscrapr for hockey; PitchF/x data in baseball; the Basketball Reference for basketball), the equivalent datasets are not freely available for researchers interested in the statistical analysis of the NFL. To solve this issue, a group of Carnegie Mellon University statistical researchers including Maksim Horowitz, Ron Yurko, and Sam Ventura, built and released nflscrapR an R package which uses an API maintained by the NFL to scrape, clean, parse, and output clean datasets at the individual play, player, game, and season levels. Using the data outputted by the package, the trio went on to develop reproducible methods for building expected point and win probability models for the NFL. The outputs of these models are included in this dataset and can be accessed using the nflscrapR package.
The dataset made available on Kaggle contains all the regular season plays from the 2009-2016 NFL seasons. The dataset has 356,768 rows and 100 columns. Each play is broken down into great detail containing information on: game situation, players involved, results, and advanced metrics such as expected point and win probability values. Detailed information about the dataset can be found at the following web page, along with more NFL data: https://github.com/ryurko/nflscrapR-data.
This dataset was compiled by Ron Yurko, Sam Ventura, and myself. Special shout-out to Ron for improving our current expected points and win probability models and compiling this dataset. All three of us are proud founders of the Carnegie Mellon Sports Analytics Club.
This dataset is meant to both grow and bring together the community of sports analytics by providing clean and easily accessible NFL data that has never been availabe on this scale for free.