18 datasets found

PMData
kaggle.com
huggingface.co
zip
Updated Apr 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VT (2021). PMData [Dataset]. https://www.kaggle.com/datasets/vlbthambawita/pmdata-a-sports-logging-dataset/discussion
Explore at:
zip(1401630710 bytes)Available download formats
Dataset updated
Apr 19, 2021
Authors
VT
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Paper: https://dl.acm.org/doi/10.1145/3339825.3394926

In this dataset, we present the PMData dataset that aims to combine traditional lifelogging with sports activity logging. Such a dataset enables the development of several interesting analysis applications, e.g., where additional sports data can be used to predict and analyze everyday developments like a person's weight and sleep patterns, and where traditional lifelog data can be used in a sports context to predict an athletes performance. In this respect, we have used the Fitbit Versa 2 smartwatch wristband, the PMSys sports logging app a and Google forms for the data collection, and PMData contains logging data for 5 months from 16 persons. Our initial experiments show that such analyzes are possible, but there are still large rooms for improvements.

Dataset details

The structure of the main folder:

The structure of the main folder:

[Main folder]

p01

p02

...

p16

participant-overview.xlsx

The structure of each sub folder (pXX):

pXX [folder]: is a folder containing data of participant XX (notation XX represents the identifier of the participant).

fitbit [folder]

calories.json: shows how many calories the person have burned the last minute.

distance.json: gives the distance moved per minute. Distance seems to be in centimeters.

exercise.json: describes each activity in more detail. It contains the date with start and stop time, time in different activity levels, type of activity and various performance metrics depending a bit on type of exercise, e.g., for running, it contains distance, time, steps, calories, speed and pace.

heart_rate.json: shows the number of heart beats per minute (bpm) at a given time.

lightly_active_minutes.json: sums up the number of lightly active minutes per day.

moderately_active_minutes.json: sums up the number of moderately active minutes per day.

resting_heart_rate.json: gives the resting heart rate per day.

sedentary_minutes.json: sums up the number of sedentary minutes per day.

sleep_score.csv: helps understand the sleep each night so you can see trends in the sleep patterns. It contains an overall 0-100 score made up from composition, revitalization and duration scores, the number of deep sleep minutes, the resting heart rate and a restlessness score.

sleep.json: is a per sleep breakdown of the sleep into periods of light, deep, rem sleeps and time awake.

steps.json: displays the number of steps per minute.

time_in_heart_rate_zones.json: gives the number of minutes in different heart rate zoned. Using the common formula of 220 minus your age, Fitbit will calculate your maximum heart rate and then create three target heart rate zones fat burn (50 to 69 percent of your max heart rate), cardio (70 to 84 percent of your max heart rate), and peak (85 to 100 percent of your max heart rate) - based off that number.

very_active_minutes.json: sums up the number of very active minutes per day.

googledocs [folder]

reporting.csv: contains one line per report including the date reported for, a timestamp of the report submission time, the eaten meals (breakfast, lunch, dinner and evening meal), the participants weight this day, the number of glasses drunk, and whether one has consumed alcohol.

pmsys [folder]

injury.csv: shows injuries with a time and date and corresponding injury locations and a minor and major severity.

srpe.csv: contains a training session’s end-time, type of activity, the perceived exertion (RPE), and the duration in the number of minutes. This is, for example, used to calculate the sessions training load or sRPE (RPE×duration).

wellness.csv: includes parameters like time and date, fatigue, mood, readiness, sleep duration (number of hours), sleep quality, soreness (and soreness area), and stress. Fatigue, sleep qual-ity, soreness, stress, and mood all have a 1-5 scale. The score 3 is normal, and 1-2 are scores below normal and 4-5 are scores above normal. Sleep length is just a measure of how long the sleep was in hours, and readiness (scale 0-10) is an overall subjective measure of how ready are you to exercise, i.e., 0 means not ready at all and 10 indicates that you cannot feel any better and are ready for anything!

food-images.zip: Participants 1, 3 and 5 have taken pictures of everything they have eaten except water during 2 months (February and March). There are food images included in this .zip file, and information about day and time is given in the...
WNBA Play-by-Play and Box Scores
kaggle.com
zip
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZachHT (2025). WNBA Play-by-Play and Box Scores [Dataset]. https://www.kaggle.com/datasets/zachht/wnba-play-by-play-and-box-scores
Explore at:
zip(27409879 bytes)Available download formats
Dataset updated
Oct 27, 2025
Authors
ZachHT
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by ZachHT

Released under Database: Open Database, Contents: Database Contents

Contents
Daily Fantasy Basketball - DraftKings NBA
kaggle.com
zip
Updated Dec 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Du (2017). Daily Fantasy Basketball - DraftKings NBA [Dataset]. https://www.kaggle.com/alandu20/daily-fantasy-basketball-draftkings
Explore at:
zip(269586908 bytes)Available download formats
Dataset updated
Dec 29, 2017
Authors
Alan Du
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

In Daily Fantasy Sports (DFS) contests, contestants construct a virtual lineup of players that score points based on their real-world performances. Unlike in season-long Fantasy Sports contests,in DFS contestants submit a new lineup for each set of games. DFS contests are held for several professional sports leagues, including the National Football League (NFL), National Basketball League (NBA), and National Hockey League (NHL). The leading DFS sites today are DraftKings and Fanduel, which control approximately 90% of the $3B DFS market.

There are three primary types of DFS games: Head-to-Heads (H2Hs), Double-Ups, and Guaranteed Prize Pools (GPPs). In H2H games, two contestants play for a single cash prize. In Double-Up games, a pool of contestants compete to place in the top 50% of lineups, which are awarded twice the entry fee. In GPPs, a pool of contestants compete for a fixed prize structure that tends to be very top heavy; some contests payout hundreds of thousands of dollars to the top finisher.

Over the last year, I have developed a winning system for daily fantasy football and baseball contests. Building this system from scratch was a fantastic compliment to the things I learned as a student, from machine learning and optimization to optimal learning and game theory. I hope others can join me in researching daily fantasy basketball and perhaps get involved with the burgeoning world of daily fantasy sports.

Content

This dataset contains 20 days of DraftKings NBA contest data scraped between 2017-11-27 and 2017-12-28. For DraftKings NBA daily fantasy basketball contest rules, see https://www.draftkings.com/help/rules/nba.

Format:

One folder per day

One folder per contest for a given day

Salary file (“DKSalaries.csv”), payout structure file (“payout_structure.csv”), and contest results file (“contest-standings.csv”) for a given contest. Column headers in each files are pretty self-explanatory.

Some additional files (e.g. “players.csv”, “covariance_mat_unfiltered.csv”, “hist_fpts_mat.csv”) for a given contest. These files were for my personal research, feel free to use or ignore.

“projections” folder contains projections data for each player from rotogrinders and daily fantasy nerd, labeled by date.

“contests.csv” contains information about each contest, e.g. entry fee, slate, and contest size.

Acknowledgements

Thank you to my friend from college, Michael Chiang, for contributions to this project.

Inspiration

A few ideas to get started:

What kind of position "stacks" tend to maximize correlation within a lineup?

How can you minimize correlation between lineups, such that you maximize your chances of winning a GPP?

What are the tendencies of some of the top DFS pros?

Can you improve rotogrinders and daily fantasy nerd player projections?

Can you predict which players are undervalued (i.e. high fantasy points / salary ratio)?

Can you predict the ownership percentage for each player in a given contest?
mega_nba2023-2024
kaggle.com
zip
Updated Nov 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
monogurui_ii (2025). mega_nba2023-2024 [Dataset]. https://www.kaggle.com/datasets/rickmcintire/mighunba2024
Explore at:
zip(31068 bytes)Available download formats
Dataset updated
Nov 17, 2025
Authors
monogurui_ii
Description
game simulator (basketball): NBA 2023-2024

The aim of this project is to generate simulations of basketball games between NBA finals teams for the 2023-2024 season for the purpose of modeling predicted outcomes from a player efficiency metric (the "r metric").

A simulated 82 game season will be run daily.

2022-2023 box score statistics for players (on a per 100 possessions basis) were gathered from https://www.basketball-reference.com/.

The players stats were filtered and transformed to reflect a focus on box score stats measuring playing efficiency, as opposed to measures of volume. For example, Real Shooting Percentage (True Shooting Percentage adjusted for volume, based on points generated above average) was incorporated into the metric as opposed to Points Per Game; Adjusted Assist to Turnover Ratio (Assist to Turnover adjusted for volume, based on assists to turnovers generated above average) was incorporated as opposed to Assists Per Game. The complete list of stats used for the r metric is as follows:

Real Shooting Percentage Offensive Rebounds Adjusted Assist to Turnover Ratio Steals Blocks Personal Fouls

The r metric efficiency rating was derived from performing a boosted regression on the overall team stats for a selection of teams for NBA seasons from 1980 to the present against their Point Differential and then applying the resulting predicted values to individual players.

An R function was created to generate simulated game outcomes from a Kaggle notebook. The output is produced as a ggplot (visualizing the r metric (in pink) against the traditional box score stats (coded by team in blue/red) and a csv file as a box score. The notebook is scheduled to run daily, randomly selecting teams to play against one another and generating an outcome based on the player stats and metric for each team with an element of random variation.
d
Sports - Cricket: Year- and Match-wise Scores, Winners, Victory Margins and...
dataful.in
Updated Nov 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Sports - Cricket: Year- and Match-wise Scores, Winners, Victory Margins and Season Winners in ODI World Cups, since 1975 [Dataset]. https://dataful.in/datasets/5809
Explore at:
application/x-parquet, xlsx, csvAvailable download formats
Dataset updated
Nov 6, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
Countries of the World
Variables measured
Matches
Description
The dataset contains year- and match-wise historical data on each match played in all the world cups since 1975. The specifics of data contained of each match includes year in which world cup was held, venue, first and second batting teams, their scores, results, winners, winning margins by number of runs or wickets, types of match, such as league match, quarter finals, semi finals, finals, etc, along with names of host country and season winner.
France and Germany Football Leagues Dataset
kaggle.com
zip
Updated Oct 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gökhan Ergül (2024). France and Germany Football Leagues Dataset [Dataset]. https://www.kaggle.com/datasets/gokhanergul/france-and-germany-football-leagues-dataset
Explore at:
zip(363605 bytes)Available download formats
Dataset updated
Oct 6, 2024
Authors
Gökhan Ergül
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Germany, France
Description
France and Germany Football Leagues Dataset

This dataset contains match data from the top football leagues and cup competitions in France and Germany. The dataset provides comprehensive information about home and away teams, their scores, match dates, and seasons. It is a valuable resource for football enthusiasts, data scientists, and analysts interested in exploring football statistics and trends across two of Europe's biggest football nations.

Dataset Summary:

Total Matches: 38,596

Countries Covered: France, Germany

Leagues Included:

Ligue 1 (France)

Ligue 2 (France)

Coupe de France

Bundesliga (Germany)

2. Bundesliga (Germany)

DFB-Pokal (Germany)

Column Descriptions:

Country: The country where the match took place (France or Germany).
Example values: 'France', 'Germany'

Lig: The specific league or cup in which the match was played. This column captures whether the match was part of Ligue 1, Ligue 2, Coupe de France, Bundesliga, 2. Bundesliga, or DFB-Pokal.
Example values: 'Ligue 1', 'Bundesliga', 'DFB-Pokal'

home_team: The name of the home team in the match.
Example values: 'Paris Saint-Germain', 'Bayern Munich'

away_team: The name of the away team in the match.
Example values: 'Olympique Lyonnais', 'Borussia Dortmund'

home_score: The number of goals scored by the home team in the match.
Example values: '3', '0'

away_score: The number of goals scored by the away team in the match.
Example values: '1', '2'

season_year: The season in which the match took place. Typically, football seasons run from one year to the next (e.g., 2022-2023 season).
Example values: '2022/2023', '2021/2022'

Date_day: The specific day on which the match was played, formatted as day and month (dd.mm).
Example values: '05.01', '29.09'

Date_hour: The hour and minute the match kicked off, formatted as hh:mm.
Example values: '20:45', '18:30'

Use Cases:

This dataset can be used for various purposes, including: - Analyzing team performance trends over different seasons. - Comparing goal-scoring patterns in home vs. away matches. - Building predictive models to forecast match outcomes based on historical data. - Understanding football dynamics in France and Germany through data visualizations.

Feel free to explore and use this dataset to draw your own insights and conclusions!

NLL Statistics

kaggle.com

zip

Updated Dec 15, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

apalensky (2020). NLL Statistics [Dataset]. https://www.kaggle.com/apalensky/nll-statistics

Explore at:

zip(816536 bytes)Available download formats

Dataset updated

Dec 15, 2020

Authors

apalensky

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

Lacrosse lags behind the big four sports in data driven insights. As the third largest indoor league, the National Lacrosse League could be next up in sports analytics breakthroughs. I will work to draw insights from this data and hope others can enjoy the same process. Insights and more files can be accessed in my GitHub repository. Some basic Tableau workbooks are available through my Tableau Public account

Content

Floor player stats by game for every publicly available box score.

Legend for NLLFloorGameStats.csv:

Day - day of the week game was played
Date - date of game
Location - hosting team
#- jersey number
Name - player name
Captain - denotes Captain and Alternate Captains
Team - player's team
G - goals
A - assists
+/- - score differential while player is on the floor
PIM - penalty minutes
S - shots on goal
SOFF - shots off goal
LB - loose balls
T - turnovers
CT - caused turnovers
FO_W - faceoff wins
FO - total faceoffs taken
TOF - time on floor

Changes to the legend for all yearly files:

Points - sum of goals and assists
PM - score differential while player is on the floor (replacement for +/-)
....PG - statistic average per game
....p60 - statistic per 60 minutes of floor time (only applicable to 2020 with recording of TOF)
ATO_ratio - assist to turnover ratio
FOpercent - faceoff percentage
ShootingPct - goals scored out of total shots taken
AdjShootingPct - goals scored out of shots on goal

Legend for NLLGoaliesGameStats.csv:

Day - day of the week game was played
Date - date of game
Location - hosting team
#- jersey number
Name - player name
Credit- denotes credited win, loss, or designated backup
Team - player's team
MIN - minutes in net
SV Q1 - saves in quarter 1
SV Q2 - saves in quarter 2
SV Q3 - saves in quarter 3
SV Q4 - saves in quarter 4
SV OT - saves in overtime
SV - total saves
SOG - shots on goal seen
GA - goals allowed

Inspiration

Lacrosse lags behind the big four sports in data driven insights. As the third largest indoor league, the National Lacrosse League could be next up in sports analytics breakthroughs.

Esports Performance Rankings and Results
kaggle.com
zip
Updated Dec 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Esports Performance Rankings and Results [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-collegiate-esports-performance-with-bu
Explore at:
zip(110148 bytes)Available download formats
Dataset updated
Dec 12, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Esports Performance Rankings and Results

Performance Rankings and Results from Multiple Esports Platforms

By [source]

About this dataset

This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).

Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).

Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!

Research Ideas

Analyze the performance of teams and identify areas for improvement for better performance in future competitions.

Assess which esports platforms are the most popular among gamers.

Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |

File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...
match data for soccer leagues
kaggle.com
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scottie Meadows (2025). match data for soccer leagues [Dataset]. https://www.kaggle.com/datasets/scottiemeadows/match-data-for-soccer-leagues
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2025
Dataset provided by
Kaggle
Authors
Scottie Meadows
Description
This CSV file contains a comprehensive dataset of simulated soccer match statistics spanning 25 years (2000-2001 to 2024-2025) for major leagues including MLS, Premier League, La Liga, and Bundesliga. Each row represents a single match and includes details such as team names, scores, match results, yellow/red cards, ball possession, offsides, fouls, team formations, starting lineups, and betting odds from Bet365. The data also breaks down goals by type (penalty, freekick, corner) and half, and includes a "build-up speed" metric. The entire dataset is sorted chronologically by match date. This file is designed to support various analytical questions related to soccer performance, strategy, and betting trends over time.
Tokyo Olympic Dataset
kaggle.com
zip
Updated Apr 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahil Kadbhane (2025). Tokyo Olympic Dataset [Dataset]. https://www.kaggle.com/datasets/sahil1kadbhane1234/tokyo-olympic-dataset
Explore at:
zip(329690417 bytes)Available download formats
Dataset updated
Apr 4, 2025
Authors
Sahil Kadbhane
Area covered
Tokyo
Description
Here's a detailed description of the Tokyo Olympics Dataset, including file descriptions and insights into its contents:

Tokyo Olympics Dataset: A Comprehensive Analysis of the 2020 Summer Games

📌 Subtitle: Explore the athletes, events, medal winners, and historical trends from the Tokyo 2020 Summer Olympics.

📝 Dataset Overview

The Tokyo Olympics 2020 dataset provides a detailed breakdown of the athletes, events, and medals awarded during the Summer Games. This dataset serves as an essential resource for data analysis, visualization, and machine learning applications related to sports analytics.

📂 File Descriptions

1. athletes.csv

This file contains detailed information about all participating athletes, including demographics and country representation.

Columns:

Athlete_ID – Unique identifier for each athlete

Name – Full name of the athlete

Gender – Male (M) / Female (F)

Age – Age of the athlete during the event

Country – Country the athlete represents

Sport – The sport in which the athlete competed

Event – Specific event the athlete participated in

Insights:

Analyze the gender distribution across various sports.

Identify the youngest and oldest participants in the Olympics.

2. medals.csv

This file lists all medal winners, including details on the type of medal awarded and the event in which it was won.

Columns:

Athlete_ID – Unique athlete reference

Name – Name of the medal-winning athlete

Country – Country represented

Sport – Sport category

Event – Specific event won

Medal – Type of medal won (Gold, Silver, Bronze)

Insights:

Track the top-performing countries based on total medals won.

Identify athletes who won multiple medals across different events.

3. events.csv

A dataset containing all sporting events held during the Tokyo 2020 Olympics.

Columns:

Event_ID – Unique event identifier

Sport – Name of the sport

Event – Name of the event

Venue – Location where the event took place

Date – Date of the event

Insights:

Understand the distribution of events across different venues.

Identify the busiest days in the Olympic schedule.

4. results.csv

This file records the performance outcomes of athletes in various events.

Columns:

Athlete_ID – Unique reference for the athlete

Event_ID – Reference to the event in which they participated

Position – Final ranking or placement in the event

Time/Score – Performance metric (e.g., time, points, or score)

Insights:

Analyze event results to find record-breaking performances.

Compare performances of athletes from different nations.

5. countries.csv

A reference file that provides details on each participating country.

Columns:

Country_Code – Standard Olympic country abbreviation

Country_Name – Full name of the country

Continent – Continent to which the country belongs

Insights:

Explore regional performance trends by grouping countries by continent.

Compare medal counts across different continents.

📊 Potential Use Cases 🔹 Sports Analytics – Identify patterns in athlete performance and event results
🔹 Machine Learning – Predict medal winners based on past data
🔹 Data Visualization – Create dashboards showing country-wise and event-wise medal counts
🔹 Time Series Analysis – Analyze trends across multiple Olympic events

This dataset is a valuable resource for data enthusiasts, sports analysts, and researchers aiming to uncover insights into the Tokyo 2020 Olympics. 🚀

Would you like me to format this further for a Kaggle dataset page? 😊
mega_mb0000data
kaggle.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
monogurui_ii (2023). mega_mb0000data [Dataset]. https://www.kaggle.com/datasets/rickmcintire/mega-mb0000data
Explore at:
zip(31948 bytes)Available download formats
Dataset updated
Jun 1, 2023
Authors
monogurui_ii
Description
game simulator (basketball): NBA Finals Teams 1980-2022

The aim of this project is to generate simulations of basketball games between NBA finals teams from 1980 to the present for the purpose of modeling predicted outcomes from a player efficiency metric (the "r metric").

A champion will be determined for the simulated season using a quadruple-elimination format, with teams eliminated from contention upon recording 4 losses until only one team remains.

Playoff box score statistics for players (on a per 100 possessions basis) were gathered from https://www.basketball-reference.com/.

The players stats were filtered and transformed to reflect a focus on box score stats measuring playing efficiency, as opposed to measures of volume. For example, Real Shooting Percentage (True Shooting Percentage adjusted for volume, based on points generated above average) was incorporated into the metric as opposed to Points Per Game; Assist to Turnover Ratio was incorporated as opposed to Assists Per Game. The complete list of stats used for the r metric is as follows:

Real Shooting Percentage

Offensive Rebounds

Assist to Turnover Ratio

Steals

Blocks

Personal Fouls

The r metric efficiency rating was derived from performing a regression on the overall team stats for a selection of teams for NBA seasons from 1980 to the present against their Point Differential and then applying the resulting predicted values to individual players.

An R function was created to generate simulated game outcomes from a Kaggle notebook. The output is produced as a ggplot (visualizing the r metric (in pink) against the traditional box score stats (coded by team in blue/red) and a csv file as a box score. The notebook is scheduled to run daily, randomly selecting two teams to play against one another and generating an outcome based on the player stats and metric for each team with an element of random variation.
EgyptianLeague
kaggle.com
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahmoud Elshabrawy (2024). EgyptianLeague [Dataset]. https://www.kaggle.com/datasets/mahmoudelshabrawy/egyptianleague
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 29, 2024
Dataset provided by
Kaggle
Authors
Mahmoud Elshabrawy
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Egyptian Premier League Match Data (2010-2024) This dataset contains detailed information about matches played in the Egyptian Premier League from 2010 to 2024. The dataset includes match statistics, team performance, referee decisions, and the outcome of each match.

Features Overview: 1. ID: Unique identifier for each match. 2. Season: The season in which the match took place. 3. Fixture: Details about the specific fixture in the league. 4. MatchDay: The match day number within the season. 5. Date: The date on which the match was played. 6. Time: The time of the match. 7. Home Team: The team playing at home. 8. Away Team: The visiting team. 9. Referee: The referee officiating the match. 10. Yellow Home: Number of yellow cards issued to the home team. 11. Yellow Away: Number of yellow cards issued to the away team. 12. 2nd Yellow Home: Number of second yellow cards (leading to a red card) for the home team. 13. 2nd Yellow Away: Number of second yellow cards for the away team. 14. Red Home: Number of red cards issued to the home team. 15. Red Away: Number of red cards issued to the away team. 16. Half Time Result: The score at halftime. 17. Full Time Result: The final score at the end of the match. 18. Home Goals: Goals scored by the home team. 19. Away Goals: Goals scored by the away team. 20. Winner: Indicates the winner of the match (Home, Away, or Draw). 21. Label: Various performance labels or categorization criteria. 22. Count: Frequency or count associated with certain labels.

Potential Use Cases: * Match Analysis: Track performance trends for different teams, referees, and players over multiple seasons. * Predictive Modeling: Create machine learning models to predict match outcomes based on past performance. * Referee Performance: Analyze the impact of referees on match outcomes and team discipline. * Team Strategy Insights: Examine the correlation between yellow/red cards and match results. * Time Series Analysis: Perform time-based analysis of matches and outcomes across different seasons. This dataset is ideal for soccer analysts, sports statisticians, and machine learning enthusiasts who are interested in exploring match data from the Egyptian Premier League.
ODI Cricket Data
kaggle.com
zip
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira (2025). ODI Cricket Data [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/odi-cricket-data
Explore at:
zip(56818 bytes)Available download formats
Dataset updated
Feb 23, 2025
Authors
willian oliveira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
this graph was created in R :

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ffd90736223cc5572985e7a2153c51327%2Ffoto3.png?generation=1740349164551931&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F87be218db233c41e5a4260c8f24a9c80%2Fgif2.gif?generation=1740349170058731&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F03d6822c7ea63bcb99b339fd51d2168d%2Fgif1.gif?generation=1740349175430653&alt=media" alt="">

This dataset provides comprehensive information on more than 2400 One Day International (ODI) cricket matches obtained from Cricsheet and includes detailed batting and bowling statistics match summaries and individual player performances with the exception of matches involving Afghanistan’s men’s team or those played in the Afghanistan Premier League due to Cricsheet’s data policy making it an excellent resource for sports analytics machine learning and cricket strategy modeling allowing users to analyze player consistency evaluate team performance predict fantasy cricket outcomes and assess match results the dataset is divided into several files including batter_player_stats.csv which contains batting data such as total runs strike rate matches played and player of the match awards bowler_player_stats.csv which offers bowling data including total wickets economy rate overs bowled and matches played as a bowler detailed_player_data.csv which provides per-match player performance data such as runs scored balls faced wickets taken catches and fantasy points and match_summary.csv which includes match-level information such as toss results match outcomes either by runs or wickets player of the match and venue details potential use cases include player performance analysis to identify the most consistent batters and bowlers across various seasons match outcome prediction by developing models that leverage historical performance data fantasy cricket strategy optimization by selecting teams based on previous player performance and cricket analytics and visualization to explore trends in runs wickets and match-winning performances enabling deeper insights into the game and supporting advanced sports research and data-driven decision-making.
2025 - 2026 NBA Fantasy Projections
kaggle.com
zip
Updated Oct 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nishaan Amin (2025). 2025 - 2026 NBA Fantasy Projections [Dataset]. https://www.kaggle.com/datasets/nishaanamin/2025-2026-nba-fantasy-projections
Explore at:
zip(152070 bytes)Available download formats
Dataset updated
Oct 20, 2025
Authors
Nishaan Amin
Description
This dataset has projections for season-long NBA fantasy for both points and category leagues. Category league projections are based on the default 9 categories (points, assists, rebounds, 3 pointers made, field goal %, free throw %, steals, blocks, and turnovers). Points league projections are based on the default scoring systems for Yahoo, ESPN, Fantrax, and Sleeper leagues.
Cricket data
kaggle.com
zip
Updated Jan 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mahendran narayanan (2020). Cricket data [Dataset]. https://www.kaggle.com/datasets/mahendran1/icc-cricket
Explore at:
zip(383854 bytes)Available download formats
Dataset updated
Jan 20, 2020
Authors
mahendran narayanan
Description
Context

Any aspiring datascientist will look everything in view of data. Even when chilling with friends, watching cricket live and cheering for the favorite team.

Content

It includes ODI, Test, t20 statistics of all the players in all the three category (batting ,bowling and fielding).

Acknowledgements

We wouldn't be here without the help of cricket. Thank you for all the great cricketers for the wonderful contribution.
ODI World Cup 2023 Complete Dataset
kaggle.com
zip
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Engg Bilal Ali Khan (2023). ODI World Cup 2023 Complete Dataset [Dataset]. https://www.kaggle.com/datasets/enggbilalalikhan/odi-world-cup-2023-complete-dataset
Explore at:
zip(85414 bytes)Available download formats
Dataset updated
Dec 12, 2023
Authors
Engg Bilal Ali Khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Comprehensive dataset containing detailed information on batting and bowling performances, as well as the schedule and results of matches from the ICC Cricket World Cup 2023. The dataset covers player statistics, match details, and more, providing a rich resource for cricket enthusiasts, analysts, and data scientists interested in exploring the dynamics of the tournament.

Content - batting_summary.csv: Player-wise batting statistics. - bowling_summary.csv: Player-wise bowling statistics. - matches_schedule_results.csv: Schedule and results of World Cup 2023 matches.
Data from: Pro Kabaddi League Dataset
kaggle.com
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sujay Kapadnis (2023). Pro Kabaddi League Dataset [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/pro-kabaddi-league-dataset
Explore at:
zip(5823861 bytes)Available download formats
Dataset updated
Dec 11, 2023
Authors
Sujay Kapadnis
Description
Data Set DS_match.csv - Contain the Match Details as per the below table match_id :- Unique ID of for each match match_number : Same of the above in text format date : Date of the Match start_time : Match begin Time on the day result : Text field explaining what is the result player_id_of_the_match : player of the match id (can be refered to the DS_players with the combination of Match ID and Player ID player_name_of_the_match : Player of the match series_id : season identifier series_name :season name status : status of the match toss_winner : toss winner team id toss_selection : toss selection venue_id : location venue_name : location name home_team_id : Team ID home_team_name : description.
English Premier League EPL xG Results (2023-24)
kaggle.com
zip
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
orkunaktas4 (2024). English Premier League EPL xG Results (2023-24) [Dataset]. https://www.kaggle.com/datasets/orkunaktas/english-premier-league-epl-results-2023-24
Explore at:
zip(24100 bytes)Available download formats
Dataset updated
Jul 22, 2024
Authors
orkunaktas4
Description
Context

this dataset contains the results and xG values of matches played in the english premier league in 2023-24

Variables

Day: Match day

Date: Match date

Time: Match Time

Home: Home Team

xG: Home Team expected goal value

Score: Match result

xG: Away Team expected goal value

Away: Away Team
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

VT (2021). PMData [Dataset]. https://www.kaggle.com/datasets/vlbthambawita/pmdata-a-sports-logging-dataset/discussion

PMData

A Sports Logging Dataset

Explore at:

zip(1401630710 bytes)Available download formats

Dataset updated

Apr 19, 2021

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Paper: https://dl.acm.org/doi/10.1145/3339825.3394926

In this dataset, we present the PMData dataset that aims to combine traditional lifelogging with sports activity logging. Such a dataset enables the development of several interesting analysis applications, e.g., where additional sports data can be used to predict and analyze everyday developments like a person's weight and sleep patterns, and where traditional lifelog data can be used in a sports context to predict an athletes performance. In this respect, we have used the Fitbit Versa 2 smartwatch wristband, the PMSys sports logging app a and Google forms for the data collection, and PMData contains logging data for 5 months from 16 persons. Our initial experiments show that such analyzes are possible, but there are still large rooms for improvements.

Dataset details

The structure of the main folder:

[Main folder]
- p01
- p02
- ...
- p16
- participant-overview.xlsx

The structure of each sub folder (pXX):

pXX [folder]: is a folder containing data of participant XX (notation XX represents the identifier of the participant).
- fitbit [folder]
  - calories.json: shows how many calories the person have burned the last minute.
  - distance.json: gives the distance moved per minute. Distance seems to be in centimeters.
  - exercise.json: describes each activity in more detail. It contains the date with start and stop time, time in different activity levels, type of activity and various performance metrics depending a bit on type of exercise, e.g., for running, it contains distance, time, steps, calories, speed and pace.
  - heart_rate.json: shows the number of heart beats per minute (bpm) at a given time.
  - lightly_active_minutes.json: sums up the number of lightly active minutes per day.
  - moderately_active_minutes.json: sums up the number of moderately active minutes per day.
  - resting_heart_rate.json: gives the resting heart rate per day.
  - sedentary_minutes.json: sums up the number of sedentary minutes per day.
  - sleep_score.csv: helps understand the sleep each night so you can see trends in the sleep patterns. It contains an overall 0-100 score made up from composition, revitalization and duration scores, the number of deep sleep minutes, the resting heart rate and a restlessness score.
  - sleep.json: is a per sleep breakdown of the sleep into periods of light, deep, rem sleeps and time awake.
  - steps.json: displays the number of steps per minute.
  - time_in_heart_rate_zones.json: gives the number of minutes in different heart rate zoned. Using the common formula of 220 minus your age, Fitbit will calculate your maximum heart rate and then create three target heart rate zones fat burn (50 to 69 percent of your max heart rate), cardio (70 to 84 percent of your max heart rate), and peak (85 to 100 percent of your max heart rate) - based off that number.
  - very_active_minutes.json: sums up the number of very active minutes per day.
- googledocs [folder]
  - reporting.csv: contains one line per report including the date reported for, a timestamp of the report submission time, the eaten meals (breakfast, lunch, dinner and evening meal), the participants weight this day, the number of glasses drunk, and whether one has consumed alcohol.
- pmsys [folder]
  - injury.csv: shows injuries with a time and date and corresponding injury locations and a minor and major severity.
  - srpe.csv: contains a training session’s end-time, type of activity, the perceived exertion (RPE), and the duration in the number of minutes. This is, for example, used to calculate the sessions training load or sRPE (RPE×duration).
  - wellness.csv: includes parameters like time and date, fatigue, mood, readiness, sleep duration (number of hours), sleep quality, soreness (and soreness area), and stress. Fatigue, sleep qual-ity, soreness, stress, and mood all have a 1-5 scale. The score 3 is normal, and 1-2 are scores below normal and 4-5 are scores above normal. Sleep length is just a measure of how long the sleep was in hours, and readiness (scale 0-10) is an overall subjective measure of how ready are you to exercise, i.e., 0 means not ready at all and 10 indicates that you cannot feel any better and are ready for anything!
  - food-images.zip: Participants 1, 3 and 5 have taken pictures of everything they have eaten except water during 2 months (February and March). There are food images included in this .zip file, and information about day and time is given in the...

Clear search

Close search

Google apps

Main menu

PMData

WNBA Play-by-Play and Box Scores

Dataset

Contents

Daily Fantasy Basketball - DraftKings NBA

Context

Content

Acknowledgements

Inspiration

mega_nba2023-2024

Sports - Cricket: Year- and Match-wise Scores, Winners, Victory Margins and...

France and Germany Football Leagues Dataset

France and Germany Football Leagues Dataset

Dataset Summary:

Column Descriptions:

Use Cases:

NLL Statistics

Context

Content

Inspiration

Esports Performance Rankings and Results

Esports Performance Rankings and Results

Performance Rankings and Results from Multiple Esports Platforms

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

match data for soccer leagues

Tokyo Olympic Dataset

Tokyo Olympics Dataset: A Comprehensive Analysis of the 2020 Summer Games

📝 Dataset Overview

📂 File Descriptions

1. athletes.csv

2. medals.csv

3. events.csv

4. results.csv

5. countries.csv

mega_mb0000data

EgyptianLeague

ODI Cricket Data

2025 - 2026 NBA Fantasy Projections

Cricket data

Context

Content

Acknowledgements

ODI World Cup 2023 Complete Dataset

Data from: Pro Kabaddi League Dataset

English Premier League EPL xG Results (2023-24)

Context

Variables

PMData

A Sports Logging Dataset

1. `athletes.csv`

2. `medals.csv`

3. `events.csv`

4. `results.csv`

5. `countries.csv`