Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Paper: https://dl.acm.org/doi/10.1145/3339825.3394926
In this dataset, we present the PMData dataset that aims to combine traditional lifelogging with sports activity logging. Such a dataset enables the development of several interesting analysis applications, e.g., where additional sports data can be used to predict and analyze everyday developments like a person's weight and sleep patterns, and where traditional lifelog data can be used in a sports context to predict an athletes performance. In this respect, we have used the Fitbit Versa 2 smartwatch wristband, the PMSys sports logging app a and Google forms for the data collection, and PMData contains logging data for 5 months from 16 persons. Our initial experiments show that such analyzes are possible, but there are still large rooms for improvements.
Dataset details
The structure of the main folder:
The structure of the main folder:
The structure of each sub folder (pXX):
pXX [folder]: is a folder containing data of participant XX (notation XX represents the identifier of the participant).
fitbit [folder]
calories.json: shows how many calories the person have burned the last minute.
distance.json: gives the distance moved per minute. Distance seems to be in centimeters.
exercise.json: describes each activity in more detail. It contains the date with start and stop time, time in different activity levels, type of activity and various performance metrics depending a bit on type of exercise, e.g., for running, it contains distance, time, steps, calories, speed and pace.
heart_rate.json: shows the number of heart beats per minute (bpm) at a given time.
lightly_active_minutes.json: sums up the number of lightly active minutes per day.
moderately_active_minutes.json: sums up the number of moderately active minutes per day.
resting_heart_rate.json: gives the resting heart rate per day.
sedentary_minutes.json: sums up the number of sedentary minutes per day.
sleep_score.csv: helps understand the sleep each night so you can see trends in the sleep patterns. It contains an overall 0-100 score made up from composition, revitalization and duration scores, the number of deep sleep minutes, the resting heart rate and a restlessness score.
sleep.json: is a per sleep breakdown of the sleep into periods of light, deep, rem sleeps and time awake.
steps.json: displays the number of steps per minute.
time_in_heart_rate_zones.json: gives the number of minutes in different heart rate zoned. Using the common formula of 220 minus your age, Fitbit will calculate your maximum heart rate and then create three target heart rate zones fat burn (50 to 69 percent of your max heart rate), cardio (70 to 84 percent of your max heart rate), and peak (85 to 100 percent of your max heart rate) - based off that number.
very_active_minutes.json: sums up the number of very active minutes per day.
googledocs [folder]
pmsys [folder]
injury.csv: shows injuries with a time and date and corresponding injury locations and a minor and major severity.
srpe.csv: contains a training session’s end-time, type of activity, the perceived exertion (RPE), and the duration in the number of minutes. This is, for example, used to calculate the sessions training load or sRPE (RPE×duration).
wellness.csv: includes parameters like time and date, fatigue, mood, readiness, sleep duration (number of hours), sleep quality, soreness (and soreness area), and stress. Fatigue, sleep qual-ity, soreness, stress, and mood all have a 1-5 scale. The score 3 is normal, and 1-2 are scores below normal and 4-5 are scores above normal. Sleep length is just a measure of how long the sleep was in hours, and readiness (scale 0-10) is an overall subjective measure of how ready are you to exercise, i.e., 0 means not ready at all and 10 indicates that you cannot feel any better and are ready for anything!
food-images.zip: Participants 1, 3 and 5 have taken pictures of everything they have eaten except water during 2 months (February and March). There are food images included in this .zip file, and information about day and time is given in the...
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by ZachHT
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In Daily Fantasy Sports (DFS) contests, contestants construct a virtual lineup of players that score points based on their real-world performances. Unlike in season-long Fantasy Sports contests,in DFS contestants submit a new lineup for each set of games. DFS contests are held for several professional sports leagues, including the National Football League (NFL), National Basketball League (NBA), and National Hockey League (NHL). The leading DFS sites today are DraftKings and Fanduel, which control approximately 90% of the $3B DFS market.
There are three primary types of DFS games: Head-to-Heads (H2Hs), Double-Ups, and Guaranteed Prize Pools (GPPs). In H2H games, two contestants play for a single cash prize. In Double-Up games, a pool of contestants compete to place in the top 50% of lineups, which are awarded twice the entry fee. In GPPs, a pool of contestants compete for a fixed prize structure that tends to be very top heavy; some contests payout hundreds of thousands of dollars to the top finisher.
Over the last year, I have developed a winning system for daily fantasy football and baseball contests. Building this system from scratch was a fantastic compliment to the things I learned as a student, from machine learning and optimization to optimal learning and game theory. I hope others can join me in researching daily fantasy basketball and perhaps get involved with the burgeoning world of daily fantasy sports.
This dataset contains 20 days of DraftKings NBA contest data scraped between 2017-11-27 and 2017-12-28. For DraftKings NBA daily fantasy basketball contest rules, see https://www.draftkings.com/help/rules/nba.
Format:
Thank you to my friend from college, Michael Chiang, for contributions to this project.
A few ideas to get started:
Facebook
Twittergame simulator (basketball): NBA 2023-2024
The aim of this project is to generate simulations of basketball games between NBA finals teams for the 2023-2024 season for the purpose of modeling predicted outcomes from a player efficiency metric (the "r metric").
A simulated 82 game season will be run daily.
2022-2023 box score statistics for players (on a per 100 possessions basis) were gathered from https://www.basketball-reference.com/.
The players stats were filtered and transformed to reflect a focus on box score stats measuring playing efficiency, as opposed to measures of volume. For example, Real Shooting Percentage (True Shooting Percentage adjusted for volume, based on points generated above average) was incorporated into the metric as opposed to Points Per Game; Adjusted Assist to Turnover Ratio (Assist to Turnover adjusted for volume, based on assists to turnovers generated above average) was incorporated as opposed to Assists Per Game. The complete list of stats used for the r metric is as follows:
Real Shooting Percentage
Offensive Rebounds
Adjusted Assist to Turnover Ratio
Steals
Blocks
Personal Fouls
The r metric efficiency rating was derived from performing a boosted regression on the overall team stats for a selection of teams for NBA seasons from 1980 to the present against their Point Differential and then applying the resulting predicted values to individual players.
An R function was created to generate simulated game outcomes from a Kaggle notebook. The output is produced as a ggplot (visualizing the r metric (in pink) against the traditional box score stats (coded by team in blue/red) and a csv file as a box score. The notebook is scheduled to run daily, randomly selecting teams to play against one another and generating an outcome based on the player stats and metric for each team with an element of random variation.
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains year- and match-wise historical data on each match played in all the world cups since 1975. The specifics of data contained of each match includes year in which world cup was held, venue, first and second batting teams, their scores, results, winners, winning margins by number of runs or wickets, types of match, such as league match, quarter finals, semi finals, finals, etc, along with names of host country and season winner.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains match data from the top football leagues and cup competitions in France and Germany. The dataset provides comprehensive information about home and away teams, their scores, match dates, and seasons. It is a valuable resource for football enthusiasts, data scientists, and analysts interested in exploring football statistics and trends across two of Europe's biggest football nations.
Country: The country where the match took place (France or Germany).
Example values: 'France', 'Germany'
Lig: The specific league or cup in which the match was played. This column captures whether the match was part of Ligue 1, Ligue 2, Coupe de France, Bundesliga, 2. Bundesliga, or DFB-Pokal.
Example values: 'Ligue 1', 'Bundesliga', 'DFB-Pokal'
home_team: The name of the home team in the match.
Example values: 'Paris Saint-Germain', 'Bayern Munich'
away_team: The name of the away team in the match.
Example values: 'Olympique Lyonnais', 'Borussia Dortmund'
home_score: The number of goals scored by the home team in the match.
Example values: '3', '0'
away_score: The number of goals scored by the away team in the match.
Example values: '1', '2'
season_year: The season in which the match took place. Typically, football seasons run from one year to the next (e.g., 2022-2023 season).
Example values: '2022/2023', '2021/2022'
Date_day: The specific day on which the match was played, formatted as day and month (dd.mm).
Example values: '05.01', '29.09'
Date_hour: The hour and minute the match kicked off, formatted as hh:mm.
Example values: '20:45', '18:30'
This dataset can be used for various purposes, including: - Analyzing team performance trends over different seasons. - Comparing goal-scoring patterns in home vs. away matches. - Building predictive models to forecast match outcomes based on historical data. - Understanding football dynamics in France and Germany through data visualizations.
Feel free to explore and use this dataset to draw your own insights and conclusions!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Lacrosse lags behind the big four sports in data driven insights. As the third largest indoor league, the National Lacrosse League could be next up in sports analytics breakthroughs. I will work to draw insights from this data and hope others can enjoy the same process. Insights and more files can be accessed in my GitHub repository. Some basic Tableau workbooks are available through my Tableau Public account
Floor player stats by game for every publicly available box score.
Legend for NLLFloorGameStats.csv:
Day - day of the week game was played
Date - date of game
Location - hosting team
#- jersey number
Name - player name
Captain - denotes Captain and Alternate Captains
Team - player's team
G - goals
A - assists
+/- - score differential while player is on the floor
PIM - penalty minutes
S - shots on goal
SOFF - shots off goal
LB - loose balls
T - turnovers
CT - caused turnovers
FO_W - faceoff wins
FO - total faceoffs taken
TOF - time on floor
Changes to the legend for all yearly files:
Points - sum of goals and assists
PM - score differential while player is on the floor (replacement for +/-)
....PG - statistic average per game
....p60 - statistic per 60 minutes of floor time (only applicable to 2020 with recording of TOF)
ATO_ratio - assist to turnover ratio
FOpercent - faceoff percentage
ShootingPct - goals scored out of total shots taken
AdjShootingPct - goals scored out of shots on goal
Legend for NLLGoaliesGameStats.csv:
Day - day of the week game was played
Date - date of game
Location - hosting team
#- jersey number
Name - player name
Credit- denotes credited win, loss, or designated backup
Team - player's team
MIN - minutes in net
SV Q1 - saves in quarter 1
SV Q2 - saves in quarter 2
SV Q3 - saves in quarter 3
SV Q4 - saves in quarter 4
SV OT - saves in overtime
SV - total saves
SOG - shots on goal seen
GA - goals allowed
Lacrosse lags behind the big four sports in data driven insights. As the third largest indoor league, the National Lacrosse League could be next up in sports analytics breakthroughs.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset provides a detailed look into the world of competitive video gaming in universities. It covers a wide range of topics, from performance rankings and results across multiple esports platforms to the individual team and university rankings within each tournament. With an incredible wealth of data, fans can discover statistics on their favorite teams or explore the challenges placed upon university gamers as they battle it out to be the best. Dive into the information provided and get an inside view into the world of collegiate esports tournaments as you assess all things from Match ID, Team 1, University affiliations, Points earned or lost in each match and special Seeds or UniSeeds for exceptional teams. Of course don't forget about exploring all the great Team Names along with their corresponding websites for further details on stats across tournaments!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Download Files First, make sure you have downloaded the CS_week1, CS_week2, CS_week3 and seeds datasets on Kaggle. You will also need to download the currentRankings file for each week of competition. All files should be saved using their originally assigned name in order for your analysis tools to read them properly (ie: CS_week1.csv).
Understand File Structure Once all data has been collected and organized into separate files on your desktop/laptop computer/mobile device/etc., it's time to become familiar with what type of information is included in each file. The main folder contains three main data files: week1-3 and seedings. The week1-3 contain teams matched against one another according to university, point score from match results as well as team name and website URL associated with university entry; whereas the seedings include a ranking system amongst university entries which are accompanied by information regarding team names, website URLs etc.. Furthermore, there is additional file featured which contains currentRankings scores for each individual player/teams for an first given period of competition (ie: first week).
Analyzing Data Now that everything is set up on your end it’s time explore! You can dive deep into trends amongst universities or individual players in regards to specific match performances or standings overall throughout weeks of competition etc… Furthermore you may also jumpstart insights via further creation of graphs based off compiled date from sources taken from BUECTracker dataset! For example let us say we wanted compare two universities- let's say Harvard University v Cornell University - against one another since beginning of event i we shall extract respective points(column),dates(column)(found under result tab) ,regions(csilluminating North America vs Europe etc)general stats such as maps played etc.. As well any other custom ideas which would come along in regards when dealing with similar datasets!
- Analyze the performance of teams and identify areas for improvement for better performance in future competitions.
- Assess which esports platforms are the most popular among gamers.
- Gain a better understanding of player rankings across different regions, based on rankings system, to create targeted strategies that could boost individual players' scoring potential or team overall success in competitive gaming events
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: CS_week1.csv | Column name | Description | |:---------------|:----------------------------------------------| | Match ID | Unique identifier for each match. (Integer) | | Team 1 | Name of the first team in the match. (String) | | University | University associated with the team. (String) |
File: CS_week1_currentRankings.csv | Column name | Description | |:--------------|:-----------------------------------------------------------|...
Facebook
TwitterThis CSV file contains a comprehensive dataset of simulated soccer match statistics spanning 25 years (2000-2001 to 2024-2025) for major leagues including MLS, Premier League, La Liga, and Bundesliga. Each row represents a single match and includes details such as team names, scores, match results, yellow/red cards, ball possession, offsides, fouls, team formations, starting lineups, and betting odds from Bet365. The data also breaks down goals by type (penalty, freekick, corner) and half, and includes a "build-up speed" metric. The entire dataset is sorted chronologically by match date. This file is designed to support various analytical questions related to soccer performance, strategy, and betting trends over time.
Facebook
TwitterHere's a detailed description of the Tokyo Olympics Dataset, including file descriptions and insights into its contents:
📌 Subtitle: Explore the athletes, events, medal winners, and historical trends from the Tokyo 2020 Summer Olympics.
The Tokyo Olympics 2020 dataset provides a detailed breakdown of the athletes, events, and medals awarded during the Summer Games. This dataset serves as an essential resource for data analysis, visualization, and machine learning applications related to sports analytics.
athletes.csvThis file contains detailed information about all participating athletes, including demographics and country representation.
Columns:
Athlete_ID – Unique identifier for each athlete Name – Full name of the athlete Gender – Male (M) / Female (F) Age – Age of the athlete during the event Country – Country the athlete represents Sport – The sport in which the athlete competed Event – Specific event the athlete participated in Insights:
medals.csvThis file lists all medal winners, including details on the type of medal awarded and the event in which it was won.
Columns:
Athlete_ID – Unique athlete reference Name – Name of the medal-winning athlete Country – Country represented Sport – Sport category Event – Specific event won Medal – Type of medal won (Gold, Silver, Bronze) Insights:
events.csvA dataset containing all sporting events held during the Tokyo 2020 Olympics.
Columns:
Event_ID – Unique event identifier Sport – Name of the sport Event – Name of the event Venue – Location where the event took place Date – Date of the event Insights:
results.csvThis file records the performance outcomes of athletes in various events.
Columns:
Athlete_ID – Unique reference for the athlete Event_ID – Reference to the event in which they participated Position – Final ranking or placement in the event Time/Score – Performance metric (e.g., time, points, or score) Insights:
countries.csvA reference file that provides details on each participating country.
Columns:
Country_Code – Standard Olympic country abbreviation Country_Name – Full name of the country Continent – Continent to which the country belongs Insights:
📊 Potential Use Cases
🔹 Sports Analytics – Identify patterns in athlete performance and event results
🔹 Machine Learning – Predict medal winners based on past data
🔹 Data Visualization – Create dashboards showing country-wise and event-wise medal counts
🔹 Time Series Analysis – Analyze trends across multiple Olympic events
This dataset is a valuable resource for data enthusiasts, sports analysts, and researchers aiming to uncover insights into the Tokyo 2020 Olympics. 🚀
Would you like me to format this further for a Kaggle dataset page? 😊
Facebook
Twittergame simulator (basketball): NBA Finals Teams 1980-2022
The aim of this project is to generate simulations of basketball games between NBA finals teams from 1980 to the present for the purpose of modeling predicted outcomes from a player efficiency metric (the "r metric").
A champion will be determined for the simulated season using a quadruple-elimination format, with teams eliminated from contention upon recording 4 losses until only one team remains.
Playoff box score statistics for players (on a per 100 possessions basis) were gathered from https://www.basketball-reference.com/.
The players stats were filtered and transformed to reflect a focus on box score stats measuring playing efficiency, as opposed to measures of volume. For example, Real Shooting Percentage (True Shooting Percentage adjusted for volume, based on points generated above average) was incorporated into the metric as opposed to Points Per Game; Assist to Turnover Ratio was incorporated as opposed to Assists Per Game. The complete list of stats used for the r metric is as follows:
The r metric efficiency rating was derived from performing a regression on the overall team stats for a selection of teams for NBA seasons from 1980 to the present against their Point Differential and then applying the resulting predicted values to individual players.
An R function was created to generate simulated game outcomes from a Kaggle notebook. The output is produced as a ggplot (visualizing the r metric (in pink) against the traditional box score stats (coded by team in blue/red) and a csv file as a box score. The notebook is scheduled to run daily, randomly selecting two teams to play against one another and generating an outcome based on the player stats and metric for each team with an element of random variation.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Egyptian Premier League Match Data (2010-2024) This dataset contains detailed information about matches played in the Egyptian Premier League from 2010 to 2024. The dataset includes match statistics, team performance, referee decisions, and the outcome of each match.
Features Overview: 1. ID: Unique identifier for each match. 2. Season: The season in which the match took place. 3. Fixture: Details about the specific fixture in the league. 4. MatchDay: The match day number within the season. 5. Date: The date on which the match was played. 6. Time: The time of the match. 7. Home Team: The team playing at home. 8. Away Team: The visiting team. 9. Referee: The referee officiating the match. 10. Yellow Home: Number of yellow cards issued to the home team. 11. Yellow Away: Number of yellow cards issued to the away team. 12. 2nd Yellow Home: Number of second yellow cards (leading to a red card) for the home team. 13. 2nd Yellow Away: Number of second yellow cards for the away team. 14. Red Home: Number of red cards issued to the home team. 15. Red Away: Number of red cards issued to the away team. 16. Half Time Result: The score at halftime. 17. Full Time Result: The final score at the end of the match. 18. Home Goals: Goals scored by the home team. 19. Away Goals: Goals scored by the away team. 20. Winner: Indicates the winner of the match (Home, Away, or Draw). 21. Label: Various performance labels or categorization criteria. 22. Count: Frequency or count associated with certain labels.
Potential Use Cases: * Match Analysis: Track performance trends for different teams, referees, and players over multiple seasons. * Predictive Modeling: Create machine learning models to predict match outcomes based on past performance. * Referee Performance: Analyze the impact of referees on match outcomes and team discipline. * Team Strategy Insights: Examine the correlation between yellow/red cards and match results. * Time Series Analysis: Perform time-based analysis of matches and outcomes across different seasons. This dataset is ideal for soccer analysts, sports statisticians, and machine learning enthusiasts who are interested in exploring match data from the Egyptian Premier League.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
this graph was created in R :
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ffd90736223cc5572985e7a2153c51327%2Ffoto3.png?generation=1740349164551931&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F87be218db233c41e5a4260c8f24a9c80%2Fgif2.gif?generation=1740349170058731&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F03d6822c7ea63bcb99b339fd51d2168d%2Fgif1.gif?generation=1740349175430653&alt=media" alt="">
This dataset provides comprehensive information on more than 2400 One Day International (ODI) cricket matches obtained from Cricsheet and includes detailed batting and bowling statistics match summaries and individual player performances with the exception of matches involving Afghanistan’s men’s team or those played in the Afghanistan Premier League due to Cricsheet’s data policy making it an excellent resource for sports analytics machine learning and cricket strategy modeling allowing users to analyze player consistency evaluate team performance predict fantasy cricket outcomes and assess match results the dataset is divided into several files including batter_player_stats.csv which contains batting data such as total runs strike rate matches played and player of the match awards bowler_player_stats.csv which offers bowling data including total wickets economy rate overs bowled and matches played as a bowler detailed_player_data.csv which provides per-match player performance data such as runs scored balls faced wickets taken catches and fantasy points and match_summary.csv which includes match-level information such as toss results match outcomes either by runs or wickets player of the match and venue details potential use cases include player performance analysis to identify the most consistent batters and bowlers across various seasons match outcome prediction by developing models that leverage historical performance data fantasy cricket strategy optimization by selecting teams based on previous player performance and cricket analytics and visualization to explore trends in runs wickets and match-winning performances enabling deeper insights into the game and supporting advanced sports research and data-driven decision-making.
Facebook
TwitterThis dataset has projections for season-long NBA fantasy for both points and category leagues. Category league projections are based on the default 9 categories (points, assists, rebounds, 3 pointers made, field goal %, free throw %, steals, blocks, and turnovers). Points league projections are based on the default scoring systems for Yahoo, ESPN, Fantrax, and Sleeper leagues.
Facebook
TwitterAny aspiring datascientist will look everything in view of data. Even when chilling with friends, watching cricket live and cheering for the favorite team.
It includes ODI, Test, t20 statistics of all the players in all the three category (batting ,bowling and fielding).
We wouldn't be here without the help of cricket. Thank you for all the great cricketers for the wonderful contribution.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Comprehensive dataset containing detailed information on batting and bowling performances, as well as the schedule and results of matches from the ICC Cricket World Cup 2023. The dataset covers player statistics, match details, and more, providing a rich resource for cricket enthusiasts, analysts, and data scientists interested in exploring the dynamics of the tournament.
Content - batting_summary.csv: Player-wise batting statistics. - bowling_summary.csv: Player-wise bowling statistics. - matches_schedule_results.csv: Schedule and results of World Cup 2023 matches.
Facebook
TwitterData Set DS_match.csv - Contain the Match Details as per the below table match_id :- Unique ID of for each match match_number : Same of the above in text format date : Date of the Match start_time : Match begin Time on the day result : Text field explaining what is the result player_id_of_the_match : player of the match id (can be refered to the DS_players with the combination of Match ID and Player ID player_name_of_the_match : Player of the match series_id : season identifier series_name :season name status : status of the match toss_winner : toss winner team id toss_selection : toss selection venue_id : location venue_name : location name home_team_id : Team ID home_team_name : description.
Facebook
Twitterthis dataset contains the results and xG values of matches played in the english premier league in 2023-24
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Paper: https://dl.acm.org/doi/10.1145/3339825.3394926
In this dataset, we present the PMData dataset that aims to combine traditional lifelogging with sports activity logging. Such a dataset enables the development of several interesting analysis applications, e.g., where additional sports data can be used to predict and analyze everyday developments like a person's weight and sleep patterns, and where traditional lifelog data can be used in a sports context to predict an athletes performance. In this respect, we have used the Fitbit Versa 2 smartwatch wristband, the PMSys sports logging app a and Google forms for the data collection, and PMData contains logging data for 5 months from 16 persons. Our initial experiments show that such analyzes are possible, but there are still large rooms for improvements.
Dataset details
The structure of the main folder:
The structure of the main folder:
The structure of each sub folder (pXX):
pXX [folder]: is a folder containing data of participant XX (notation XX represents the identifier of the participant).
fitbit [folder]
calories.json: shows how many calories the person have burned the last minute.
distance.json: gives the distance moved per minute. Distance seems to be in centimeters.
exercise.json: describes each activity in more detail. It contains the date with start and stop time, time in different activity levels, type of activity and various performance metrics depending a bit on type of exercise, e.g., for running, it contains distance, time, steps, calories, speed and pace.
heart_rate.json: shows the number of heart beats per minute (bpm) at a given time.
lightly_active_minutes.json: sums up the number of lightly active minutes per day.
moderately_active_minutes.json: sums up the number of moderately active minutes per day.
resting_heart_rate.json: gives the resting heart rate per day.
sedentary_minutes.json: sums up the number of sedentary minutes per day.
sleep_score.csv: helps understand the sleep each night so you can see trends in the sleep patterns. It contains an overall 0-100 score made up from composition, revitalization and duration scores, the number of deep sleep minutes, the resting heart rate and a restlessness score.
sleep.json: is a per sleep breakdown of the sleep into periods of light, deep, rem sleeps and time awake.
steps.json: displays the number of steps per minute.
time_in_heart_rate_zones.json: gives the number of minutes in different heart rate zoned. Using the common formula of 220 minus your age, Fitbit will calculate your maximum heart rate and then create three target heart rate zones fat burn (50 to 69 percent of your max heart rate), cardio (70 to 84 percent of your max heart rate), and peak (85 to 100 percent of your max heart rate) - based off that number.
very_active_minutes.json: sums up the number of very active minutes per day.
googledocs [folder]
pmsys [folder]
injury.csv: shows injuries with a time and date and corresponding injury locations and a minor and major severity.
srpe.csv: contains a training session’s end-time, type of activity, the perceived exertion (RPE), and the duration in the number of minutes. This is, for example, used to calculate the sessions training load or sRPE (RPE×duration).
wellness.csv: includes parameters like time and date, fatigue, mood, readiness, sleep duration (number of hours), sleep quality, soreness (and soreness area), and stress. Fatigue, sleep qual-ity, soreness, stress, and mood all have a 1-5 scale. The score 3 is normal, and 1-2 are scores below normal and 4-5 are scores above normal. Sleep length is just a measure of how long the sleep was in hours, and readiness (scale 0-10) is an overall subjective measure of how ready are you to exercise, i.e., 0 means not ready at all and 10 indicates that you cannot feel any better and are ready for anything!
food-images.zip: Participants 1, 3 and 5 have taken pictures of everything they have eaten except water during 2 months (February and March). There are food images included in this .zip file, and information about day and time is given in the...