The share of Premier League matches ending in a draw varied by season, with the lowest ever draw percentage of 18.7 percent coming in 2018/19. As of October 15, 2024, 30 percent of Premier League matches in 2024/25 ended with points being shared.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Welcome to the Premier League Match Statistics dataset! ⚽ This guide will help you understand the structure of the dataset, key variables, and how to make the most of the data for analysis and predictions.
This dataset contains detailed match statistics from the English Premier League, including final scores, player statistics, team performance, goals, yellow cards, red cards, and more. It is ideal for analyzing team performance, predicting match outcomes, and exploring trends in football. This dataset is valuable for football enthusiasts, data analysts, and predictive model developer.
This dataset provides comprehensive match statistics from the English Premier League, including team performance, player stats, goals, assists, yellow/red cards, and more. It is ideal for football enthusiasts, analysts, and machine learning projects.
The dataset consists of multiple columns, each representing different aspects of a match:
Column Name | Description |
---|---|
Match_ID | Unique identifier for each match |
Date | Match date (YYYY-MM-DD format) |
Home_Team | Name of the home team |
Away_Team | Name of the away team |
Home_Goals | Goals scored by the home team |
Away_Goals | Goals scored by the away team |
Possession_% | Possession percentage of each team |
Shots_On_Target | Number of shots on target |
Yellow_Cards | Number of yellow cards given |
Red_Cards | Number of red cards given |
Player_of_Match | Best-performing player of the match |
Additional columns may provide more in-depth insights.
Here are some ideas to explore using this dataset:
✅ Analyze team performance trends over different seasons.
✅ Predict match outcomes using machine learning models.
✅ Identify key players based on goals, assists, and ratings.
✅ Explore disciplinary records (yellow/red cards) for fair play analysis.
In 2022/23, the average matchday attendance in the Premier League was just over 40,000, representing a slight increase on the previous year. In the same season, Bundesliga games saw an average attendance of nearly 43,000.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Premier League Matches 2014-2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sanjeetsinghnaik/premier-league-matches-20142020 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The Premier League is by far one of the world’s most entertaining leagues. They have some of the best managers, players and fans! But, what makes it truly entertaining is the sheer unpredictability. There are 6 equally amazing teams with a different team lifting the trophy every season. Not only that, the league has also witnessed victories from teams outside of the top 6. So, let us analyze some of these instances.
So far, the implementation of statistics into soccer has been positive. Teams can easily gather information about their opponents and their tactics. This convenience allows managers to create well-thought out game plans that suit their team, maximize opponents' weaknesses, and increase their chances of winning.
A goal is scored when the whole of the ball passes over the goal line, between the goalposts and under the crossbar, provided that no offence has been committed by the team scoring the goal. If the goalkeeper throws the ball directly into the opponents' goal, a goal kick is awarded.
THE TIME OF SEASON/MOTIVATION: While a club battling for a league title is going to be hungry for a win, as is a side that is fighting to stay up, a club that has already won the title or has already been relegated is unlikely to work as hard, and often rest players as well. THE REFEREE: Of course, when referee's send players off it make a massive impact on a match, but even if he is just awarding a yellow card then it can affect the outcome of the game as the player booked is less likely to go in as hard for the rest of the match.
SUBSTITUTES: The whole point of substitutes is for them to be able to come on and impact a match. Subs not only bring on a fresh pair of legs that are less tired than starters and more likely to track back and push forward, but can also play crucial roles in the formation of a team.
MIND GAMES/MANAGERS: Playing mind games has almost become a regular routine for top level managers, and rightly so. Just a simple mind game can do so much to impact a match, a good example coming from Sir Alex Ferguson.
Per his autobiography, when Manchester United were losing late on in a match at a certain point he would tap his watch and make sure to let the opposition know he is signalling this to his players. United's opposition already know that United have a tendency to come back from behind, and upon seeing this gesture they will think that United are going to come back. And because scientific studies prove that living creatures are more likely to accept things that have happened before than not - horses are more likely to lose to a horse they have already lost to in a race even if they are on an even playing field - they often succumb to a loss.
FORM/INJURIES/FIXTURES: A team on better form is more likely to win a match than if they have been on a poor run of form, while a team in the middle of a condensed run of fixtures is less likely to win than a well rested team. These are just some of the things that affect matches - if you have any other just mention them in the comment section below and I'll try to add them in!
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study explored factors that influence actual playing time by comparing the Chinese Super League (CSL) and English Premier League (EPL). Eighteen factors were classified into anthropogenic and non-anthropogenic factors. Fifty CSL matches (season 2019) and 50 EPL matches (season 2019–2020) were analyzed. An independent sample t-test with effect size (Cohen’s d) at a 95% confidence interval was used to evaluate differences in the influencing factors between the CSL and EPL. Two multiple linear regression models regarding the CSL and EPL were conducted to compare the influencing factors’ impact on actual playing time. The results showed that the average actual playing time (p
From 2012/13 to 2022/23, total match time in Premier League games generally increased, while the number of in-play minutes generally decreased. This was addressed in 2023/24, with the average amount of in-play time per game being 59 minutes and 3 seconds - over four minutes more than the previous season.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A simple dataset providing the results of each game in last years Premier League season (19/20) along with pre match odds!
In the 2012/13 season, the average in-play time of English Premier League matches was 58.5 percent. By 2022/23, this had dropped to 55.9 percent, meaning that fans may have seen a slight increase in stoppages and off-the-ball time wasting.
From sublime touches by Kevin De Bruyne to astounding finishes by Erling Haaland and many more. This dataset has match events for every minute played and summaries of over 300+ Premier League matches from 2023/24 season.
Dive into the drama, tactics, and player performances, and uncover new perspectives in sports analytics.
Example Summary:
Cole Palmer sensationally scored FOUR goals as Mauricio Pochettino oversaw his biggest win with Chelsea, his side making up ground in the race for Europe with a 6-0 home win over Everton. In a topsy-turvy season, Palmer has so often dragged Chelsea through, and he followed up his dramatic treble against Manchester United in their previous game at Stamford Bridge with four more goals. His first goal made him the first Chelsea player to score in seven straight Premier League home appearances, and he went on to net the earliest ever hat-trick by a Blues player in the competition, in the 29th minute….
Example Match Event:
Hello everyone and welcome to live text coverage of the Premier League match between Arsenal and Nottingham Forest at the Emirates Stadium. The opening weekend of the Premier League continues with reigning champions Manchester City setting the early tempo with a commanding 3-0 win over Burnley on Friday. Arsenal will aim to follow suit as they look to launch their title challenge against the side that ended their pursuit last season… Arsenal quickly settling into their rhythm and Saka earns a corner via Aina's deflection. But Forest deal with two deliveries into their box. The Gunners are unbeaten in each of their last nine home league meetings with Forest, winning five and drawing four.
Saka cuts inside from the right wing and seeks out Nketiah with a clever reverse pass. However, it is just lacking the required accuracy and Turner comes out to collect.
Source: Premier League (https://www.premierleague.com/) FotMob (https://www.fotmob.com/en-GB)
Use: Summarization task and Text Generation
CC-BY-NC
Original Data Source: English Premier League - Match Commentary
As of August 2024, Gareth Barry held the record for the most appearances in the English Premier League, with a career total of 653 games. The former midfielder played for a number of Premier League clubs, including Aston Villa, Manchester City, and Everton.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘English Premier League stats 2019-2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/idoyo92/epl-stats-20192020 on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This Dataset is a merge of two EPL datasets I found online.
First, make sure to look up https://github.com/vaastav/Fantasy-Premier-League who has done an amazing job of collecting stats from the FPL app. There are further players' stats that I might share in the future. The second source is https://datahub.io/sports-data/english-premier-league, where some additional stats are available souch as referee name and betting odds (I kept 365 in the data, you might want to compare odds, etc)
Each row is a summary of a EPL game from one team's perspective. Among the stats you can find shots on target, xG Index, PPDA (measures pressing play) and more.
Notice: I added induvidual players stats. see the attached csv.
Acknowledgements:
As mentioned above, the collecting was done by others. Make sure you take a look and upvote the Github repo that is trully great.
So the EPL is currently shut down, we don't know when it'll be back. By that time, could you predict results? find trends?
--- Original source retains full ownership of the source dataset ---
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Data set for people who love Football and Data Science. Scraping code at GitHub repo: https://github.com/themlphdstudent/kaggle/blob/master/datasets/Premier%20League%20Player%20Stats/Premier%20League%20Player%20Stats.ipynb
The data is scraped from the website https://www.msn.com/en-us/sports/soccer/premier-league/player-stats by extracting the player stats in premier league.
The data has been crawled from the https://www.msn.com/en-us/sports/soccer/premier-league/player-stats website. Cover photo credit : Photo by Fachry Zella Devandra on Unsplash.
This dataset provides detailed statistics for 380 matches from the 2005-2006 English Premier League season. It includes:
Team performance: Full-time/half-time goals, shots, fouls, corners, and cards. Match outcomes: Results (Home Win, Draw, Away Win) for both full-time and half-time. Referee data: Names of referees for each match.
Ideal for analyzing team strategies, referee influence, or building predictive models. Data is structured in an Excel file (football-raw-data.xlsx) with clear column headers for easy analysis.
Use cases: - Sports analytics - Performance trend visualization - Machine learning (e.g., match outcome prediction) - Historical football research
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Barclays Premiere League for last 12 seasons’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lumierebatalong/english-premiere-league-team-datasets on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Barclay premier league is the best league in the world 💯 . It has 20 teams that qualified for the title. Among these 20 teams there are 5 teams which have already won the title in the last 12 seasons namely Man City, Liverpool, Man United, Chelsea, Leicester with two outsiders Arsenal and Tottenham. Who is your favorite team and how can you predict their title victory for the current or next season? The ball is in your camp 👀 .
Notes for Football Data
All data is in csv format, ready for use within standard spreadsheet applications. Please note that some abbreviations are no longer in use and refer to data collected in earlier seasons. Each data contains last 12 seasons of English Premier League.
Key to results data:
Div = League Division Date = Match Date (dd/mm/yy) Time = Time of match kick off HomeTeam = Home Team AwayTeam = Away Team FTHG and HG = Full Time Home Team Goals FTAG and AG = Full Time Away Team Goals FTR and Res = Full Time Result (H=Home Win, D=Draw, A=Away Win) HTHG = Half Time Home Team Goals HTAG = Half Time Away Team Goals HTR = Half Time Result (H=Home Win, D=Draw, A=Away Win)
Match Statistics (where available) Attendance = Crowd Attendance Referee = Match Referee HS = Home Team Shots AS = Away Team Shots HST = Home Team Shots on Target AST = Away Team Shots on Target HHW = Home Team Hit Woodwork AHW = Away Team Hit Woodwork HC = Home Team Corners AC = Away Team Corners HF = Home Team Fouls Committed AF = Away Team Fouls Committed HFKC = Home Team Free Kicks Conceded AFKC = Away Team Free Kicks Conceded HO = Home Team Offsides AO = Away Team Offsides HY = Home Team Yellow Cards AY = Away Team Yellow Cards HR = Home Team Red Cards AR = Away Team Red Cards
I remove some features.
This dataset contains data for last 12 seasons of English Premier League. The dataset is sourced from http://www.football-data.co.uk/ website and contains various statistical data such as final and half time result, corners, yellow and red cards etc
Can you explain why Man United has not won the title for last 12 seasons?. Can you predict the victory of your favorite team in every championship game?.
--- Original source retains full ownership of the source dataset ---
In the 2022/23 season, players in the English Premier League (EPL) tended to be sent off at a higher frequency than Women's Super League (WSL) players, with a red card being shown nearly every 13 games on average. Meanwhile, in the same season, the WSL saw a red card once in nearly every 16 matches.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Fantasy Premier League or popularly known as FPL in short, is the most popular fantasy game played worldwide. It is the official fantasy game of the English Premier League and runs throughout the duration of the league itself. The number of registered users keep on growing season by season and the FPL community or so we call it, has grown tremendously over the recent years. To keep it short, FPL can be called an opportuniy - an opportunity to learn tactics; to make friends and the accompanied banters, 'mini-leagues'; to add an extra spice to watching matches etc. and the list can go on
You can find all the details at - FPL Official Site
There are basically 3 datasets: - Gameweeks.csv : Contains all the 38 Gameweeks' data - Players.csv : Contains all players' data - Teams.csv : Contains all clubs' data
Basically the idea behind this datasets is to perform an extensive EDA
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
What is FPL? Fantasy Premier League (FPL) is an online fantasy football game based on the English Premier League. In the game, participants select a squad of real-life Premier League players and earn points based on their performances in actual matches.
Here are some facts about FPL:
FPL has over 9 million registered users worldwide, making it one of the most popular fantasy sports games in the world. The budget for each FPL team is £100.0m, with the most expensive player being Mohamed Salah at £13 million for the current season. The highest-scoring FPL player of all time is again Mohamed Salah, who scored 303 points in the 2017/18 season. Content This dataset contains a collection of tweets with keywords Fantasy Premier League and FPL. The tweets were scraped using the snscrape library. Check out the Tutorial Notebook
The dataset includes the following information for each tweet:
ID: The unique identifier for the tweet. Timestamp: The date and time when the tweet was posted. User: The Twitter handle of the user who posted the tweet. Text: The content of the tweet. Hashtag: The hashtags included in the tweet, if any. Retweets: The number of times the tweet has been retweeted as of the time it was scraped. Likes: The number of likes the tweet has received as of the time it was scraped. Replies: The number of replies to the tweet as of the time it was scraped. Source: The source application or device used to post the tweet. Location: The location listed on the user's Twitter profile, if any. Verified_Account: A Boolean value indicating whether the user's Twitter account has been verified. Followers: The number of followers the user has as of the time the tweet was scraped. Following: The number of accounts the user is following as of the time the tweet was scraped The dataset provides a glimpse into the online chatter related to Fantasy Premier League and can be used for various natural language processing and machine learning tasks, such as sentiment analysis, topic modeling, and more. It allows an understanding of the community, the level of interest, and the experience of playing FPL.
Original Data Source: FPL Tweets Dataset
This statistic shows the results of a representative survey of the British Public in relation to the English Premier League, presenting the share of British Adults that watch Premier League football matches, by frequency and age group. Within each age groups the largest of share of respondents indicated that they never watch English Premier League football matches. The age group with the smallest share of respondents indicating that they never watch Premier League football matches was the 45-54 year old, where only 46 percent of them responded as such.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Premier League Players Performance Dataset
This dataset provides a comprehensive overview of player performance in the Premier League capturing a wide array of metrics related to gameplay, scoring, passing, and defensive actions. With records detailing individual player statistics across different teams, this dataset is a valuable resource for analysts, data scientists, and fans who are interested in diving into player performance data from one of the world’s top soccer leagues.
Each entry represents a single player's profile, featuring data on expected goals (xG), expected assists (xAG), touches, dribbles, tackles, and more. This dataset is ideal for analyzing various aspects of player contribution, both offensively and defensively, and understanding their impact on team performance.
Dataset Columns
Player: Name of the player Team: Team the player belongs to '#' : Player's jersey number Nation: Nationality of the player Position: Primary playing position on the field Age: Age of the player Minutes: Total minutes played Goals: Number of goals scored Assists: Number of assists Penalty Shoot on Goal: Penalty shots taken on goal Penalty Shoot: Total penalty shots attempted Total Shoot: Total shots attempted Shoot on Target: Shots successfully on target Yellow Cards: Number of yellow cards received Red Cards: Number of red cards received Touches: Total ball touches Dribbles: Total dribbles attempted Tackles: Total tackles made Blocks: Total blocks Expected Goals (xG): Expected goals, calculated based on shooting positions and likelihood of scoring Non-Penalty xG (npxG): Expected goals excluding penalties Expected Assists (xAG): Expected assists, based on actions leading to an expected goal (xG) Shot-Creating Actions: Actions leading to a shot attempt Goal-Creating Actions: Actions leading to a goal Passes Completed: Successful passes completed Passes Attempted: Total passes attempted Pass Completion %: Pass completion rate, expressed as a percentage (some entries have missing values here) Progressive Passes: Passes advancing the ball significantly toward the opponent’s goal Carries: Total ball carries Progressive Carries: Carries advancing the ball significantly toward the opponent’s goal Dribble Attempts: Total dribbles attempted Successful Dribbles: Total successful dribbles Date: Date of record collection or game date
Potential Use Cases
Data Visualization: Explore relationships between various performance metrics to identify patterns.
Player Comparisons: Compare individual players based on goals, assists, xG, xAG, and other metrics.
Team Analysis: Evaluate contributions of players within the same team to gain insights into team dynamics.
Predictive Modeling: Use the dataset to build models for predicting game outcomes, goals, or assists based on player performance metrics.
The dataset contains data pertaining to key result areas, match statistics and betting odds for Barclays' premier league 2018/19 season. Column description provided in Discussion section.
The share of Premier League matches ending in a draw varied by season, with the lowest ever draw percentage of 18.7 percent coming in 2018/19. As of October 15, 2024, 30 percent of Premier League matches in 2024/25 ended with points being shared.