https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
My family has always been serious about fantasy football. I've managed my own team since elementary school. It's a fun reason to talk with each other on a weekly basis for almost half the year.
Ever since I was in 8th grade I've dreamed of building an AI that could draft players and choose lineups for me. I started off in Excel and have since worked my way up to more sophisticated machine learning. The one thing that I've been lacking is really good data, which is why I decided to scrape pro-football-reference.com
for all recorded NFL player data.
From what I've been able to determine researching, this is the most complete public source of NFL player stats available online. I scraped every NFL player in their database going back to the 1940s. That's over 25,000 players who have played over 1,000,000 football games.
The scraper code can be found here. Feel free to user, alter, or contribute to the repository.
The data was scraped 12/1/17-12/4/17
When I uploaded this dataset back in 2017, I had two people reach out to me who shared my passion for fantasy football and data science. We quickly decided to band together to create machine-learning-generated fantasy football predictions. Our website is https://gridironai.com. Over the last several years, we've worked to add dozens of data sources to our data stream that's collected weekly. Feel free to use this scraper for basic stats, but if you'd like a more complete dataset that's updated every week, check out our site.
The data is broken into two parts. There is a players table where each player has been assigned an ID and a game stats table that has one entry per game played. These tables can be linked together using the player ID.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This folder contains data behind the story The Rams Are Dead To Me, So I Answered 3,352 Questions To Find A New NFL Team.
team-picking-categories.csv
contains grades for each NFL franchise in 16 categories, to be used to pick a new favorite team.
abbrev | category |
---|---|
FRL | Fan relations - Courtesy by players, coaches and front offices toward fans, and how well a team uses technology to reach them |
OWN | Ownership - Honesty; loyalty to core players and the community |
PLA | Players - Effort on the field, likability off it |
FUT | Future wins - Projected wins over next 5 seasons |
BWG | Bandwagon Factor - Are the team's next 5 years likely to be better than their previous 5? |
TRD | Tradition - Championships/division titles/wins in team's entire history |
BNG | Bang for the buck - Wins per fan dollars spent |
BEH | Behavior - Suspensions by players on team since 2007, with extra weight to transgressions vs. women |
NYP | Proximity to New York City |
SLP | Proximity to St. Louis |
AFF | Affordability - Price of tickets, parking and concessions |
SMK | Small Market - Size of market in terms of population, where smaller is better |
STX | Stadium experience - Quality of venue; fan-friendliness of environment; frequency of game-day promotions |
CCH | Coaching - Strength of on-field leadership |
UNI | Uniform - Stylishness of uniform design, according to Uni Watch's Paul Lukas |
BMK | Big Market - Size of market in terms of population, where bigger is better |
Should be used in conjunction with weights derived from a survey structured like this: http://www.allourideas.org/nflteampickingsample.
This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!
This dataset is maintained using GitHub's API and Kaggle's API.
This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Sports Performance Analysis: Researchers or coaches can use the "Football detection" model to analyze sports performance. It can identify the players, ball, and cone positions to study player movements, ball possession, and team formation tactics during a football game.
Sports Broadcasting and Media: The model can be used by sports broadcasters to enhance user experience. They can utilize the model to automatically track players and the ball, and provide real-time statistics during live broadcasts.
Augmented Reality Apps: Mobile app developers can use the model to create augmented reality (AR) applications for football fans. For example, an AR app that enables users to analyze football play strategies by identifying players and the ball in real-time.
Security and Surveillance: At football stadiums, the model could be used as a security tool to monitor crowd movements and detect any unusual activities. It can keep track of the location of different teams to ensure that they are in their designated areas.
Fitness and Training: The model could be used by fitness trainers or individuals to assess performance during training sessions. It can help identify if the player's positioning and movements correlate with the ideal tactics.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context For a personal project, I was looking for a dataset with all the international soccer matches since the beginning. I found the following dataset (International football results from 1872 to 2017) but noticed that there were errors and missing data so I tried to retrieve data from the Internet to make the ultimate international results dataset (although very much inspired by the previous dataset).
Content As of March 20, 2025, the dataset includes 50,243 results from international men's soccer matches since 1872. The dataset includes only national teams (at least affiliated to a continental confederation).
all_matches.csv
includes the following columns:
- date
- date of the match (when the date is unknown, 31/12 by default or last day of the month if known)
- home_team
- the name of the home team
- away_team
- the name of the away team
- home_score
- full-time home team score including extra time, not including penalty-shootouts
- away_score
- full-time away team score including extra time, not including penalty-shootouts
- tournament
- the name of the tournament
- country
- the name of the country where the match was played
- neutral
- TRUE/FALSE column indicating whether the match was played at a neutral venue
countries_names.csv
includes the following columns:
- original_name
- country name used in the dataset
- current_name
- country name used nowadays (or name of the country that inherited the history)
The name of the National Team are the one used at the time of the match. You can use the countries_name.csv
file to match current teams.
Aknowledgements The data is mainly scraped from the eloratings.net website.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset created using the ESPN Football News Scraper aims to provide comprehensive, up-to-date information on football news. This scraper is designed to extract articles from ESPN, one of the leading sports news providers globally, ensuring that users have access to the latest football-related updates, match reports, player interviews, transfer news, and expert analysis.
The primary source for this dataset is ESPN's football section. ESPN is renowned for its extensive coverage of sports, including in-depth reporting on various football leagues, tournaments, and events. By leveraging the ESPN Football News Scraper, the dataset captures:
The inspiration behind creating this dataset is to cater to the needs of football enthusiasts, analysts, and content creators who require reliable and timely information about football. By aggregating news from a reputable source like ESPN, the dataset offers a rich repository of information that can be used for:
This dataset aims to be a valuable resource for anyone interested in football, offering a consolidated view of the sport's dynamic and ever-changing landscape.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed to facilitate the detection of objects in football match images. The two main classes are:
The football is a spherical object, typically small and distinctive from the players and field. It may be found on the grass or in the air during active play.
Players are human figures engaged in the game, typically wearing sports uniforms. This category includes referees who are dressed differently but are part of the on-field activity.
The project has two main research questions:
RQ1 - what is the financial impact of Covid-19 on English professional football clubs so far? RQ2 - what is the wider impact to the local community focusing on four professional football clubs and football community trusts?
The data collected for the project is broken down below across the two research questions highlighted above and is split between quantitative data (research question 1) and qualitative data (research question 2).
Data collection for RQ1 Quantitative data was extracted from the financial statements of football clubs and the relevant financial data was used to create a bespoke financial database in Microsoft Excel. The data covers all 92 professional football clubs in the EPL and EFL in any given season from 1992/1993 to 2019/2020. At present there are 20 clubs that compete in the EPL and 24 in each of the Championship, League 1, and League 2. Owing to promotion and relegation during the time period analysed, our database covers a total of 112 unique professional football clubs. The financial database contains 28 independent variables in respect of financial and sporting performance that we have defined as Key Performance Indicators (KPIs) for a football club.
Data collection for RQ2 Qualitative data was sourced from four professional football clubs that are currently competing in League 1 at the time of writing. Semi-structured interviews were conducted with key individuals at the clubs. A total of 18 interviews were undertaken across the four clubs. Owing to the Covid-19 situation and various lockdowns and restrictions throughout the project, the majority of interviews (apart from one face-to-face visit) were conducted online using Microsoft Teams. Interviews were recorded and transcribed in Teams and then exported to Quirkos (a specialist qualitative analysis programme) for further thematic analysis. Interview schedules were designed based on job role of the interviewee. For example, interviews with CEOs covered all aspects of the business including finance and strategy whereas interviews with Community Managers focused more on the fans of clubs and wider social impact.
Many professional football clubs, particularly in the lower leagues of English football, face financial ruin and are on the brink of collapse. This situation is likely to be exacerbated throughout 2021 owing to the current COVID-19 global pandemic. Consequently, this project aims to analyse the financial impact of COVID-19 on professional football clubs in England and the wider impact on their communities. In the last year, one community has already lost its professional football club (Bury FC) and other communities have been affected by the demise of semi-professional clubs (e.g. Rhyl FC).
The project has three main research questions. First, what is the financial impact of COVID-19 on the professional football clubs? Second, what is the wider economic impact to the local community that a club is placed in given the distinct possibility of matches being played behind closed doors for a considerable amount of time? Thirdly, what are the wider effects on the community in football community trusts and social cohesion?
The project serves to provide a rapid analysis of the impact of COVID-19 on English football's finances. In relation to research question one, the focus will surround the financial situation of professional clubs including issues such as the distribution of wealth and financial disparity between clubs in the English football system that could lead to overspending and potential insolvency. It will also consider the impact of broadcast rights distribution, solidarity payments and parachute payments across the system and provide strategic direction for a collective recovery. The intention is to stimulate discussion, analysis, interest, and research on how football governing bodies can use the opportunities presented by the COVID-19 pandemic to reset the financial landscape in the English system. Such discussion provides a more balanced, competitive suite of competitions that collectively tackle financial inequality and put aside self-interest. COVID-19 allows us to revisit existing issues in how football leagues in England are governed. Existing structures have created a significant financial disparity between the professional leagues; a financial disparity that has grown since the formation of the English Premier League (EPL) in 1992/93 and which COVID-19 has laid bare.
Additionally, the pandemic has presented several financial issues that may threaten the sustainability and future of clubs and which are tied to the broader financial performance of clubs. These are the impact on non-playing staff (with regards to redundancies and pay cuts), the impact of having no fans in stadiums (from both a financial and social aspect) and the impact on businesses situated near football stadia that rely on matchday attendances to boost trade. All these issues are...
This dataset, named "state_trends.csv," contains information about different U.S. states. Let's break down the attributes and understand what each column represents:
In summary, this dataset provides a variety of information about U.S. states, including demographic data, geographical region, psychological region, personality traits, and scores related to interests or proficiencies in various fields such as data science, art, and sports.
Context
Fantasy basketball is a simple game. You select a team and fill out a roster. Each player has a price and you have a budget constraint that you should consider while building your team. You succeed or fail based on how well your players perform. Fantasy sport websites uses their own pricing algorithm and they mostly don’t tell people what their pricing algorithm looks like. In this case study, you will try to explore fantasy basketball data and the player pricing algorithm used for a fantasy basketball website.
Acknowledgements: Invent Analytics for providing data
Club football data is an collection of data on match, player, club, referees and atomic level event in competitions ranging from english, french, spanish, german, italian league for season 2017/2018.
The project involves the collation of vital club football data, this is different from the general statistics that exist on the internet such as total shots, total cards etc. This data project captures granular football match and event data including stats surrounding the events and the personnels (player, referees, ...) involved.
This data project aims to provide valuable insight on a club, person, match at an atomic-event level. It means analysis can be done on a given day and time, a football match is played in a given club league championship between two teams with eleven starting players with a match referee in the middle.
Much appreciation to Luca Pappalardo Figshare Soccer match data for the data source.
An analytics dashboard can be created using the data on the following
The list goes on, feel free to generate your own insights
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The FIFA World Cup is the world's most prestigious and widely-watched football tournament, bringing together the best national teams from across the globe to compete for the title of world champion. The first FIFA World Cup was held in 1930, and since then, the tournament has been held every four years, with the exception of 1942 and 1946 due to World War II.
Over the years, the FIFA World Cup has become a massive cultural phenomenon, with billions of people tuning in to watch the matches and follow the progress of their favorite teams. As a result, a tremendous amount of data has been generated about the tournament, including information about every match that has ever been played.
The FIFA World Cup 1930-2022 All Match Dataset is a comprehensive collection of data about every match that has been played in the tournament, from the inaugural tournament in 1930 to the most recent edition in 2022.
The FIFA World Cup 1930-2022 All Match Dataset contains a CSV file with the following columns - Year: the year in which the World Cup tournament was held. - Datetime: the date and time when the match was played. - MatchID: the ID of the match. - Stage: the stage of the tournament in which the match was played (e.g., group stage, quarter-finals, semi-finals, final). - Stadium: the name of the stadium where the match was played. - City: the city where the stadium is located. - Home Team Name: the name of the home team. - Home Team Goals: the number of goals scored by the home team. - Away Team Goals: the number of goals scored by the away team. - Away Team Name: the name of the away team. - Win conditions: any special conditions under which the match was won (e.g., extra time, penalty shootout).
In addition to the basic match data, the dataset also includes more detailed information about each team, the number of times they have participated in the tournament, and their overall record in the tournament. This information can be used to analyze the performance of different teams over time, identify trends and patterns in the tournament, and make predictions about future matches.
Overall, the FIFA World Cup 1930-2022 All Match Dataset is an incredibly valuable resource for anyone interested in football, sports analytics, or data science. With so much information available, it provides a wealth of opportunities for research, analysis, and exploration and is sure to be a valuable asset for years to come.
There has always been a challenge that we could recognize individuals by just few images.(Because in many occasions we do not have enough images) Now , I have just gathered few images from 5 best football players from different websites to set a Siamese model and learn the model to determine a distance score for each one of them. Each player has less than 15 images in training and exactly 3 images for validation.
Note : All images format are JPG Images have low quality.
This dataset contains information about all matches played in the first division of Argentinian football from 2015 to 2022. There are two identical files in terms of information but one is in English (_eng) and one in Spanish (_spa).
I came across the idea of analysing particularities of Argentinian football but I didn't find any dataset that includes the information I wanted in an easy and summarised way. So I decided to generate this information myself.
It contains the information of 2821 first division matches of Argentinian football by grouping data from promeidos.com.ar, transfermarkt.de and oddportal.com on afa_2015_2022_eng.csv
:
* tournament
: name of the current tournament when the match was played. promiedos
* week
: on which date the match was played. promiedos.
* match
: number of the match within the date. promiedos.
* team_home(away)
: name of the home(away) team promiedos.
* goals_home(away)
: number of goals scored by the home(away) team. promiedos.
* possesion_home(away)
: possession percentage of the home(away) team. promiedos.
* on_target_home(away)
: shots on goal by home(away) team. promiedos.
* attemps_home(away)
: total attempts of home(away) team. promiedos.
* fouls_home(away)
: fouls committed by home team. promiedos.
* corner_kicks_home(away)
: corner kicks taken by home team. promiedos.
* yellow_cards_home(away)
: yellow cards of home(away) team. promiedos.
* red_cards_home(away)
: red cards of the home(away) team. promiedos.
* team_value_home(away)
: market value according to home team. transfermarkt.
* mean_height_home(away)
: average height of home(away) team. transfermarkt.
* mean_age_home(away)
: average age of home(away) team. transfermarkt.
* lefties_proportion_home(away)
: proportion of left-handed players in home(away) team. transfermarkt.
* result
: result of the match.
* game_datetime
: date of the match. oddsportal.
* odds_home(away)
: bet return for a win of the home (away) team. oddsportal.
* odds_draw
: return bet for a draw. oddsportal.
The names of the teams and tournaments are the ones that are used on promiedos.com.ar.
If you notice a mistake you can fix that by submitting a pull request. https://github.com/francescocamussoni/argentinian_football_results_2015_2022
Este set de datos contiene información acerca de todos los partidos realizados en la primera división del fútbol argentino desde el 2015 hasta el 2022. Hay dos archivos identicos en cuanto a información pero uno está en ingles(_eng) y otro en español (_spa)
Tuve la idea de analizar particularidades del futbol argentino pero no encontré ningún set de datos que incluya la información que deseaba de forma fácil y resumida. Entonces decidí generar esta información por mi mismo.
Contiene la información de 2821 partidos de primera división del fútbol argentino agrupando datos de promeidos.com.ar, transfermarkt.de and oddportal.com en el archivo afa_2015_2022_spa.csv
:
* torneo
: nombre del torneo en curso cuando se jugó el partido. promiedos
* fecha
: en qué fecha se jugó el partido. promiedos.
* partido
: número de partido dentro de la fecha. promiedos.
* equipo_local(visitante)
: nombre del equipo local(visitante) promiedos.
* goles_local(visitante)
: número de goles anotados por el equipo local(visitante). promiedos.
* goles_visitante(visitante)
: porcentaje de posesión del equipo local(visitante). promiedos.
* tiros_arco_local(visitante)
: tiros al arco del equipo local(visitante). promiedos.
* intentos_local(visitante)
: intentos totales del equipo local(visitante). promiedos.
* faltas_local(visitante)
: faltas cometidas por el equipo local(visitante). promiedos.
* tiros_esquina_local(visitante)
: tiros de esquina realizados por el equipo local(visitante). promiedos.
* amarillas_local(visitante)
: tarjetas amarillas del equipo local(visitante). promiedos.
* rojas_local(visitante)
: tarjetas rojas del equipo local(visitante). promiedos.
* valor_mercado_local(visitante)
: valor de mercado según del equipo local(visitante). transfermarkt.
* altura_media_local(visitante)
: altura media del equipo local(visitante). transfermarkt.
* edad_media_local(visitante)
: edad media del equipo local(visitante). transfermarkt.
* proporcion_zurdos_local(visitante)
: proporcion de jugadores zurdos en el equipo local (visitante). transfermarkt.
* resultado
: resultado del encuentr...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
My family has always been serious about fantasy football. I've managed my own team since elementary school. It's a fun reason to talk with each other on a weekly basis for almost half the year.
Ever since I was in 8th grade I've dreamed of building an AI that could draft players and choose lineups for me. I started off in Excel and have since worked my way up to more sophisticated machine learning. The one thing that I've been lacking is really good data, which is why I decided to scrape pro-football-reference.com
for all recorded NFL player data.
From what I've been able to determine researching, this is the most complete public source of NFL player stats available online. I scraped every NFL player in their database going back to the 1940s. That's over 25,000 players who have played over 1,000,000 football games.
The scraper code can be found here. Feel free to user, alter, or contribute to the repository.
The data was scraped 12/1/17-12/4/17
When I uploaded this dataset back in 2017, I had two people reach out to me who shared my passion for fantasy football and data science. We quickly decided to band together to create machine-learning-generated fantasy football predictions. Our website is https://gridironai.com. Over the last several years, we've worked to add dozens of data sources to our data stream that's collected weekly. Feel free to use this scraper for basic stats, but if you'd like a more complete dataset that's updated every week, check out our site.
The data is broken into two parts. There is a players table where each player has been assigned an ID and a game stats table that has one entry per game played. These tables can be linked together using the player ID.