This soccer database comes from Kaggle and is well suited for data analysis and machine learning.
It contains data for soccer matches, players, and teams from several European countries from 2008 to 2016. This dataset is quite extensive I investigate the Soccer dataset. Mainly, the dataset have 7 tables called 'Country', 'League', 'Match', 'Player', 'Player Attributes', 'Team' and 'Team Attributes'. the dataset contains useful data about 11 seasons between 2008 and 2016 in different leagues and a list of (players, teams) attributes Players and Teams' attributes* sourced from EA Sports' FIFA video game series. Detailed match events (goal types, possession, corner, cross, fouls, cards etc…) for +10,000 matches 16th Oct 2016: New table containing teams' attributes from FIFA! Each record has its table connected with the other by identification numbers. the player's table describes players' names and their weight and height. player attributes describe their abilities and rating their potentials.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The European Soccer Database provides annual team characteristics data for approximately 300 football teams in 11 European countries from 2008 to 2016, and is a dataset that numerically summarizes various tactical attributes such as each team's unique ID, recording point, team play style, attack and defense strategy, pressure intensity, and pass style.
2) Data Utilization (1) European Soccer Database has characteristics that: • Each row consists of around 25 numerical and categorical variables, including team_api_id, date, buildUpPlaySpeed, buildUpPass, chanceCreationPassing, chanceCreationShooting, and defencePressure, allowing quantitative comparison of tactical changes and tendencies by season and team. (2) European Soccer Database can be used to: • Team Tactical Change and Performance Analysis: By analyzing changes in key tactical indicators such as team build-up, opportunity creation, and defensive strategies by year, you can understand the tactical evolution of a particular team or changes in trends within the league. • Study the correlation between winning and winning percentage and team characteristics: By linking and analyzing team tactical characteristics and actual match results (winning, winning percentage, etc.), you can statistically explore which team attributes are associated with high performance.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Notes for Football Data from football-data.co.uk.
All data is in csv format, ready for use within standard spreadsheet applications. Please note that some abbreviations are no longer in use (in particular odds from specific bookmakers no longer used) and refer to data collected in earlier seasons. For a current list of what bookmakers are included in the dataset please visit http://www.football-data.co.uk/matches.php
Key to results data:
Div = League Division Date = Match Date (dd/mm/yy) Time = Time of match kick off HomeTeam = Home Team AwayTeam = Away Team FTHG and HG = Full Time Home Team Goals FTAG and AG = Full Time Away Team Goals FTR and Res = Full Time Result (H=Home Win, D=Draw, A=Away Win) HTHG = Half Time Home Team Goals HTAG = Half Time Away Team Goals HTR = Half Time Result (H=Home Win, D=Draw, A=Away Win)
Match Statistics (where available) Attendance = Crowd Attendance Referee = Match Referee HS = Home Team Shots AS = Away Team Shots HST = Home Team Shots on Target AST = Away Team Shots on Target HHW = Home Team Hit Woodwork AHW = Away Team Hit Woodwork HC = Home Team Corners AC = Away Team Corners HF = Home Team Fouls Committed AF = Away Team Fouls Committed HFKC = Home Team Free Kicks Conceded AFKC = Away Team Free Kicks Conceded HO = Home Team Offsides AO = Away Team Offsides HY = Home Team Yellow Cards AY = Away Team Yellow Cards HR = Home Team Red Cards AR = Away Team Red Cards HBP = Home Team Bookings Points (10 = yellow, 25 = red) ABP = Away Team Bookings Points (10 = yellow, 25 = red)
Note that Free Kicks Conceeded includes fouls, offsides and any other offense commmitted and will always be equal to or higher than the number of fouls. Fouls make up the vast majority of Free Kicks Conceded. Free Kicks Conceded are shown when specific data on Fouls are not available (France 2nd, Belgium 1st and Greece 1st divisions).
Note also that English and Scottish yellow cards do not include the initial yellow card when a second is shown to a player converting it into a red, but this is included as a yellow (plus red) for European games.
Key to 1X2 (match) betting odds data:
B365H = Bet365 home win odds B365D = Bet365 draw odds B365A = Bet365 away win odds BSH = Blue Square home win odds BSD = Blue Square draw odds BSA = Blue Square away win odds BWH = Bet&Win home win odds BWD = Bet&Win draw odds BWA = Bet&Win away win odds GBH = Gamebookers home win odds GBD = Gamebookers draw odds GBA = Gamebookers away win odds IWH = Interwetten home win odds IWD = Interwetten draw odds IWA = Interwetten away win odds LBH = Ladbrokes home win odds LBD = Ladbrokes draw odds LBA = Ladbrokes away win odds PSH and PH = Pinnacle home win odds PSD and PD = Pinnacle draw odds PSA and PA = Pinnacle away win odds SOH = Sporting Odds home win odds SOD = Sporting Odds draw odds SOA = Sporting Odds away win odds SBH = Sportingbet home win odds SBD = Sportingbet draw odds SBA = Sportingbet away win odds SJH = Stan James home win odds SJD = Stan James draw odds SJA = Stan James away win odds SYH = Stanleybet home win odds SYD = Stanleybet draw odds SYA = Stanleybet away win odds VCH = VC Bet home win odds VCD = VC Bet draw odds VCA = VC Bet away win odds WHH = William Hill home win odds WHD = William Hill draw odds WHA = William Hill away win odds
Bb1X2 = Number of BetBrain bookmakers used to calculate match odds averages and maximums BbMxH = Betbrain maximum home win odds BbAvH = Betbrain average home win odds BbMxD = Betbrain maximum draw odds BbAvD = Betbrain average draw win odds BbMxA = Betbrain maximum away win odds BbAvA = Betbrain average away win odds
MaxH = Market maximum home win odds MaxD = Market maximum draw win odds MaxA = Market maximum away win odds AvgH = Market average home win odds AvgD = Market average draw win odds AvgA = Market average away win odds
Key to total goals betting odds:
BbOU = Number of BetBrain bookmakers used to calculate over/under 2.5 goals (total goals) averages and maximums BbMx>2.5 = Betbrain maximum over 2.5 goals BbAv>2.5 = Betbrain average over 2.5 goals BbMx<2.5 = Betbrain maximum under 2.5 goals BbAv<2.5 = Betbrain average under 2.5 goals
GB>2.5 = Gamebookers over 2.5 goals GB<2.5 = Gamebookers under 2.5 goals B365>2.5 = Bet365 over 2.5 goals B365<2.5 = Bet365 under 2.5 goals P>2.5 = Pinnacle over 2.5 goals P<2.5 = Pinnacle under 2.5 goals Max>2.5 = Market maximum over 2.5 goals Max<2.5 = Market maximum under 2.5 goals Avg>2.5 = Market average over 2.5 goals Avg<2.5 = Market average under 2.5 goals
Key to Asian handicap betting odds:
BbAH = Number of BetBrain bookmakers used to Asian handicap averages and maximums BbAHh = Betbrain size of handicap (home team) AHh = Market size of handicap (home team) (since 2019/2020) BbMxAHH = Betbrain maximum Asian han...
Most publicly available football (soccer) statistics are limited to aggregated data such as Goals, Shots, Fouls, Cards. When assessing performance or building predictive models, this simple aggregation, without any context, can be misleading. For example, a team that produced 10 shots on target from long range has a lower chance of scoring than a club that produced the same amount of shots from inside the box. However, metrics derived from this simple count of shots will similarly asses the two teams.
A football game generates much more events and it is very important and interesting to take into account the context in which those events were generated. This dataset should keep sports analytics enthusiasts awake for long hours as the number of questions that can be asked is huge.
This dataset is a result of a very tiresome effort of webscraping and integrating different data sources. The central element is the text commentary. All the events were derived by reverse engineering the text commentary, using regex. Using this, I was able to derive 11 types of events, as well as the main player and secondary player involved in those events and many other statistics. In case I've missed extracting some useful information, you are gladly invited to do so and share your findings. The dataset provides a granular view of 9,074 games, totaling 941,009 events from the biggest 5 European football (soccer) leagues: England, Spain, Germany, Italy, France from 2011/2012 season to 2016/2017 season as of 25.01.2017. There are games that have been played during these seasons for which I could not collect detailed data. Overall, over 90% of the played games during these seasons have event data.
The dataset is organized in 3 files:
I have used this data to:
There are tons of interesting questions a sports enthusiast can answer with this dataset. For example:
And many many more...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
European soccer database
⚽ Explore an extensive dataset featuring detailed player statistics exclusively from the top 7 European football leagues:
EPL (English Premier League)
Bundesliga 🇩🇪
La Liga 🇪🇸
Serie A 🇮🇹
Ligue 1 🇫🇷
Eredivisie 🇳🇱
Primeira Liga 🇵🇹
This dataset provides comprehensive insights into player performances, including attributes like goals, assists, minutes played, and other key metrics. Uncover in-depth player analyses and comparisons across leagues to fuel your football data-driven strategies and player evaluations! 📈🥅⚽
https://eu-football.infohttps://eu-football.info
Complete list of all-time European national teams international football matches, euro football results
We analyze the effects of top tax rates on international migration of football players in 14 European countries since 1985. Both country case studies and multinomial regressions show evidence of strong mobility responses to tax rates, with an elasticity of the number of foreign (domestic) players to the net-of-tax rate around one (around 0.15). We also find evidence of sorting effects (low taxes attract highability players who displace low-ability players) and displacement effects (low taxes on foreigners displace domestic players). Those results can be rationalized in a simple model of migration and taxation with rigid labor demand.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presened data are used to determine how the change of teams’ efficiency affects the level of competitive balance in the top European football leagues. The data about valuation of teams were collected from Transfermarket, while the number of goals and points were collected from the sites of the national leagues.
Although there is a common belief that more footballers are representing another country than their native ones in recent World Cup editions, a historical overview on migrant footballers representing national teams is lacking. To fill this lacuna, we created a database consisting of 9.400 football players who participated in the FIFA World Cup (1930-2014). In order to count the number of migrant footballers in national teams over time, we critically reflect on the term migrant and the commonly used foreign-born proxy in mainstream migration research. We argue that such a foreign-born approach overlooks historical-geopolitical changes like the redrawing of international boundaries and colonial relationships, and tends to shy away from citizenship complexities, leading to an overestimation of the number of migrant footballers in the database. Therefore, we offer an alternative approach which, through historical contextualization with an emphasis on citizenship, results in more accurate and reliable data on migrant football players. We coin this the contextual-nationality approach.
Although the reliability of the information on Wikipedia-pages can be questioned, we used this source because the data we needed was pretty straightforward and not readily accessible at other, perhaps more trustworthy, online football databases like Transfermarkt.co.uk or Footballdatabase.eu. In case a footballer was foreign-born or (possibly) a migrant, we verified the Wikipedia-data with information from (inter)national newspapers and football magazines. Reliable data on the genealogy of players was often harder to find, as the majority of (grand-) parents are, or were, not internationally famous themselves.
The depositor provided the data file in XLSX format. DANS added the ODS format of this file.
On April 16th 2018, a small correction was made in the rows related to football player Tony Cascarino.
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Source: https://www.kaggle.com/datasets/hugomathien/soccer by Hugo Mathien
About Dataset
The ultimate Soccer database for data analysis and machine learning
What you get:
+25,000 matches +10,000 players 11 European Countries with their lead championship Seasons 2008 to 2016 Players and Teams' attributes* sourced from EA Sports' FIFA video game series, including the weekly updates Team line up with squad formation (X, Y coordinates) Betting odds from up to 10 providers… See the full description on the dataset page: https://huggingface.co/datasets/julien-c/kaggle-hugomathien-soccer.
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset contains comprehensive information on the teams participating in the UEFA Euro 2024 tournament. It includes details about each team, their group stage placement, FIFA rankings, captains, head coaches, pre-tournament forms, and average player age.
Columns Description:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
European Cup football results & rankings
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Includes data for over 23000 matches and over 2 million events for those matches!
This dataset contains information on 6 of the top European football/soccer leagues. I plan on updating this dataset weekly/biweekly with data for new matches played as well as gradually going backwards for game data as well.
(All data listed below is through roughly present during the current season.)
English Premier League ** Game Data - 2001 ** Aggregate Stats - 2002 ** Tables - 2001
Spanish La Liga ** Game Data - 2004 ** Aggregate Stats - 2002 ** Tables - 2000
German Bundesliga ** Game Data - 2002 ** Aggregate Stats - 2002 ** Tables - 2000
Italian Serie A ** Game Data - 2016 ** Aggregate Stats - 2001 ** Tables - 2000
Dutch Eredivisie ** Game Data - 2018 ** Aggregate Stats - 2001 ** Tables - 2000
French Ligue 1 ** Game Data - 2018 ** Aggregate Stats - 2002 ** Tables - 2002
Some notes: * Year as a column refers to the year a season started in. So if a match was played in January 2021, it's value for year would be 2020 because that season started in 2020. * Some older matches have no commentary, but they do have one row in events.csv to denote such
ESPN, as that's where this data is scraped from Image
The displayed data on the interest in (other) European football leagues shows results of the Statista European Football Benchmark conducted in England in 2018. Some ** percent of respondents stated that they are interested in the Bundesliga (Germany).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Count-based metrics extracted from the 2016 UEFA Men’s European Football Championship and the 2017 UEFA Women’s Championship. Key metrics by player position and gender were extracted from match action logs and integrated as a single dataset. The resulting dataset of length $n = 4211$ contains 33 variables. The gender target variable is expressed as 1 for male players and 0 for female players. There are 2700 male instances and 1511 female instances. The dataset contains two categorical variables; match period is expressed as 1H for the first half, 2H for the second half, and E1, E2, and P for the two possible overtimes and the penalties respectively, player position in the team formation has the following options: Defender, Midfielder, Forward, Goalkeeper, Substitute Defender, Substitute Midfielder, Substitute Forward and Substitute Goalkeeper. Table \ref{tab:stats} shows the mean value and standard deviation per gender of each of the 30 numerical features of the dataset.
As of 2025, there were 211 FIFA-recognized national soccer associations worldwide, with 55 being part of UEFA. Meanwhile, 54 national soccer associations were part of CAF.
https://eu-football.infohttps://eu-football.info
Complete list of international footballers, most international caps, number of goals for national football teams, dates of birth
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Garcia-del-Barrio, P., & Reade, J. J. (2024). "Talent allocation in European football leagues: why competitive imbalance may be optimal?". Jahrbücher für Nationalökonomie und Statistik, 244(5-6), 631-670.
The provided content consists of two files: - The "data" file (with the dataset) - The "do" file (with the estimations).
Based on a simple model, the paper empirically examines some major sources of interest that fans show in sport events, such as: (i) degree of competitive balance that determines the uncertainty of the outcome (ii) concentration of gifted players in a team, whose interaction of talents on the field enhances the quality of the ‘product joint’ that is a sporting event (iii) joint aggregate quality of rival teams. The argument of the paper relies on the idea that overall quality of a sport event encompasses more than the mere sum of individual talents. Our empirical analyses show that certain degree of talent imbalance between rival teams seems to be better than a perfect competitive balance – to broaden the interest of fans on the sport events and, thus, maximise economic outcomes. Disaggregate estimations reveal discrepancies across football domestic leagues.
STATA to perform the following tasks: - Obtain the summary statistics of the main variables - Compute the estimations and regression analyses - Produce the tables (reported in Appendix A and Appendix B of the paper).
To run the "do" file, you must first modify the initial line to tell the location path of the "dataset" file. Then, the code is organised to deliver, in due order, the estimations of all the tables reported in Appendix A:
TABLE_A.6.a: Ln(Revenue) constant_returns - Filtered Elo
TABLE_A.1.b: Ln(WageLimit) - Basic Model
TABLE_A.2.b: Ln(WageLimit) - Filtered MVI
TABLE_A.3.b: Ln(WageLimit) - Filtered Elo
TABLE_A.4.b: Ln(WageLimit) constant_returns - Basic Model
TABLE_A.5.b: Ln(WageLimit) constant_returns - Filtered MVI
TABLE_A.6.b: Ln(WageLimit) constant_returns - Filtered Elo
Then, the results and calculations of the tables in the main body of the paper are calculated, based on the information found on these tables. The final part of the "do" file produces the estimations of the table in Appendix B:
In Tables 7 and 8 there are some typographical errors, which are obvious with a simple inspection of the tables in the Appendices: the figures for the filtered MVI and the filtered Elo have been mistakenly swapped.
For any reamining questions, please contact me via pgbarrio@unav.es
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper introduces a hitherto unused source of information to evaluate the importance of uncertainty in driving demand for particular types of entertainment events. We use web searches via Google, and consider various dimensions of uncertainty of outcome in sporting events. Most saliently, we consider whether the complete removal of uncertainty surrounding the winner of a competition, something that often happens before European football leagues have completed, reduces interest. We find that the decrease in interest is significant, but that it is mitigated by the existence of multiple objectives (secondary prizes), such as qualifying for European competitions and avoiding relegation, which expands the fan interest in these leagues. We conclude by affirming that such a diversified structure of competition, replete with an open structure of promotion and relegation, is desirable in the context of league competitions such as those in Europe that do not have a prominent play-off system to conclude the season.
This soccer database comes from Kaggle and is well suited for data analysis and machine learning.
It contains data for soccer matches, players, and teams from several European countries from 2008 to 2016. This dataset is quite extensive I investigate the Soccer dataset. Mainly, the dataset have 7 tables called 'Country', 'League', 'Match', 'Player', 'Player Attributes', 'Team' and 'Team Attributes'. the dataset contains useful data about 11 seasons between 2008 and 2016 in different leagues and a list of (players, teams) attributes Players and Teams' attributes* sourced from EA Sports' FIFA video game series. Detailed match events (goal types, possession, corner, cross, fouls, cards etc…) for +10,000 matches 16th Oct 2016: New table containing teams' attributes from FIFA! Each record has its table connected with the other by identification numbers. the player's table describes players' names and their weight and height. player attributes describe their abilities and rating their potentials.