Facebook
TwitterUpdate 2023-10-13: The data now includes 2022 season.
Update 2022-08-06: The data now includes 2021 season.
Update 2021-08-02: The data now includes 2020 season and metrics for 2019 have been updated.
Update 2020-08-03: The data now includes 2017, 2018 and 2019 seasons. Keep in mind that metrics like gp, pts, reb, etc. are not complete for 2019 season, as it is ongoing at the time of upload.
As a life-long fan of basketball, I always wanted to combine my enthusiasm for the sport with passion for analytics 🏀📊. So, I utilized the NBA Stats API to pull together this data set. I hope it will prove to be as interesting to work with for you as it has been for me!
The data set contains over two decades of data on each player who has been part of an NBA teams' roster. It captures demographic variables such as age, height, weight and place of birth, biographical details like the team played for, draft year and round. In addition, it has basic box score statistics such as games played, average number of points, rebounds, assists, etc.
The pull initially contained 52 rows of missing data. The gaps have been manually filled using data from Basketball Reference. I am not aware of any other data quality issues.
The data set can be used to explore how age/height/weight tendencies have changed over time due to changes in game philosophy and player development strategies. Also, it could be interesting to see how geographically diverse the NBA is and how oversees talents have influenced it. A longitudinal study on players' career arches can also be performed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Basketball Association (NBA) is a professional basketball league in North America composed of 30 teams. It is one of the major professional sports leagues in the United States and Canada and is considered the premier professional basketball league in the world.
The NBA draft combine is a multi-day showcase that takes place every May before the annual NBA draft. At the combine, college basketball players are measured and take medical tests, are interviewed, perform various athletic tests and shooting drills, and play in five-on-five drills for an audience of National Basketball Association (NBA) coaches, general managers, and scouts. Athletes attend by invitation only. An athlete's performance during the combine can affect perception, draft status, salary, and ultimately the player's career.
This dataset includes the anthropometric measurements collected during Draft Combine events in years 2000-2023. It has been acquired using NBA Stats API. The units have been converted from imperial to metric system (inches to centimeters and pounds to kilograms). The numbers have been rounded to two decimal points.
Facebook
TwitterBy data.world's Admin [source]
This dataset contains daily visitor-submitted birthdays and associated data from an ongoing experimentation known as the Birthday Paradox. Be enlightened as you learn how many people have chosen the same day of their birthday as yours. Get a better perspective on how this phenomenon varies day-to-day, including recent submissions within the last 24 hours. This experiment is published under the MIT License, giving you access to detailed information behind this perplexing cognitive illusion. Find out now why the probability of two people in the same room having birthday matches is much higher than one might expect!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides data on the Birthday Paradox Visitor Experiments. It contains information such as daily visitor-submitted birthdays, the total number of visitors who have submitted birthdays, the total number of visitors who guessed the same day as their birthday, and more. This dataset can be used to analyze patterns in visitor behavior related to the Birthday Paradox Experiment.
In order to use this dataset effectively and efficiently, it is important to understand its fields and variables:
- Updated: The date when this data was last updated
- Count: The total number of visitors who have submitted birthdays
- Recent: The number of visitors who have submitted birthdays in the last 24 hours
- binnedDay: The day of the week for a given visitor's birthday submission
- binnedGuess: The day of week that a given visitor guessed their birthday would fall on 6) Tally: Total number of visitors who guessed same day as their birthday 7) binnedTally: Total number of visitors grouped by guess dayTo begin using this dataset you should first filter your data based on desired criteria such as date range or binnedDay. For instance, if you are interested in analyzing Birthady Paradox Experiment results for Monday submissions only then you can filter your data by binnedDay = 'Monday'. Then further analyze your filtered query by examining other fields such as binnedGuess and comparing it with tally or binnedTally results accordingly. For example if we look at Monday entries above we should compare 'Monday' tallies with 'Tuesday' guesses (or any other weekday). ` Furthermore understanding updates from recent field can also provide interesting insights into user behavior related to Birthady Paradox Experiment -- trackingt recent entries may yield valuable trends over time.
By exploring various combinations offields available in this dataset users will be ableto gain a better understandingof how user behaviordiffers across different daysofweek both within a singledayandover periodsoftimeaccordingtodifferent criteria providedbythisdataset
- Analyzing the likelihood of whether a person will guess their own birthday correctly.
- Estimating which day of the week is seeing the most number of visitors submitting their birthdays each day and analyzing how this varies over time.
- Investigating how likely it is for two people from different regions to have the same birthday by comparing their respective submission rates on each day of the week
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: data.csv | Column name | Description | |:----------------|:-----------------------------------------------------------------------------------| | updated | The date and time the data was last updated. (DateTime) | | count | The total number of visitor submissions. (Integer) | | recent | The number of visitor submissions in the last 24 hours. (Integer) | | binnedDay | The day of the week the visitor submitted their birthday. (String) | | binnedGuess | The day of the week the visitor guessed their birthday. (String) | | tally | The total number of visitor guesses that matched their actual birthdays. (Integer) | | binnedTally | The day of the week the visitor guessed their birthday correctly. (String) |
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data set contains combined on-court performance data for NBA players in the 2016-2017 season, alongside salary, Twitter engagement, and Wikipedia traffic data.
Further information can be found in a series of articles for IBM Developerworks: "Explore valuation and attendance using data science and machine learning" and "Exploring the individual NBA players".
A talk about this dataset has slides from March, 2018, Strata:
Further reading on this dataset is in the book Pragmatic AI, in Chapter 6 or full book, Pragmatic AI: An introduction to Cloud-based Machine Learning and watch lesson 9 in Essential Machine Learning and AI with Python and Jupyter Notebook
You can watch a breakdown of using cluster analysis on the Pragmatic AI YouTube channel
Learn to deploy a Kaggle project into a production Machine Learning sklearn + flask + container by reading Python for Devops: Learn Ruthlessly Effective Automation, Chapter 14: MLOps and Machine learning engineering
Use social media to predict a winning season with this notebook: https://github.com/noahgift/core-stats-datascience/blob/master/Lesson2_7_Trends_Supervized_Learning.ipynb
Learn to use the cloud for data analysis.
Data sources include ESPN, Basketball-Reference, Twitter, Five-ThirtyEight, and Wikipedia. The source code for this dataset (in Python and R) can be found on GitHub. Links to more writing can be found at noahgift.com.
Facebook
TwitterAn average of **** million viewers tuned in to watch NBA regular season games across ABC, ESPN and TNT in the 2024/25 season. This marked a slight decline in the number of viewers from the previous season.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I wanted to learn web scraping in order to make website for basketball, so I created this dataset as part of my learning. I will try to keep it updated as much as possible.
Facebook
TwitterThe NBA and WNBA are the two top leagues for basketball in the United States for men and women, respectively. In the NBA, players took home an average annual salary of over ** million U.S. dollars for the 2024/25 season, with the league's minimum salary set at **** million U.S. dollars that year. In comparison, players in the WNBA received an average annual pay of ******* U.S. dollars in the 2025 season, with the highest-earning players in the WNBA receiving around ******* U.S. dollars annually.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Introduction:
Embark on an enthralling exploration into the illustrious careers of basketball's most iconic figures in the NBA Legends Dataset. This meticulously curated collection chronicles the remarkable odysseys of legendary players, offering intimate glimpses into their unparalleled skills, unwavering determination, and relentless pursuit of excellence. As a tribute to the enduring legacies and profound impacts these legends have had on the game and countless lives, this dataset encapsulates their transcendent influences, both on and off the court.
Column Descriptions:
Influence of NBA Legends:
The enduring legacies of NBA legends transcend basketball, serving as timeless sources of inspiration for athletes and enthusiasts alike. Their remarkable achievements, unwavering work ethics, and unyielding self-belief epitomize the essence of greatness and resilience. As we delve into the intricacies of their journeys through this dataset, may their indelible spirits continue to inspire and motivate us to pursue excellence in every aspect of life
Photo by JC Gellidon on Unsplash
Facebook
TwitterBy Homeland Infrastructure Foundation [source]
This dataset provides detailed information on major sport venues, along with their usage and affiliations. It includes data related to the National Association for Stock Car Auto Racing, Indy Racing League, Major League Soccer, Major League Baseball, National Basketball Association, Women's National Basketball Association, National Hockey League, National Football League, PGA Tour, NCAA Division 1 FBS Football, NCAA Division 1 Basketball and thoroughbred horse racing.* This dataset contains columns such as USE (which describes the type of use for the venue), TEAM (the team associated with the venue), LEAGUE (the league associated with the venue) , CONFERENCE (the conference associated with the venue), DIVISION (the division associated with the venue), INST_AFFIL(the institution affiliation associatedwith the venue), TRACK_TYPE(type of track at a specific point in time or over its complete life-cycle) as well as LENGTH_MILEGE ('length of track in milege') ROOF_TYPE(The type of roof covering used at a specific point in time or over its complete life-cycle) and plenty other variables. With this astounding range and quantity of data points -- spanning countries across different continents and leagues -- explore patterns in sports games you never even thought were possible!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
The MajorUS Sports Venues Usage and Affiliations dataset includes data on major sports venues from leagues including National Association for Stock Car Auto Racing (NASCAR), Indy Racing League (IRL), Major League Soccer (MLS), Major League Baseball (MLB), National Basketball Association (NBA), Women's National Basketball Association (WNBA), National Hockey League (NHL), National Football League(NFL), PGA Tour, NCAA Division 1 FBS Football, NCAA Division 1 Basketball, and thoroughbred horse racing. The columns provided include
USE_,USE_POP,TEAM,LEAGUE,CONFERENCE,DIVISION,INST_AFFIL,TRACK_TYPE.LENGTH_MI,ROOF_TYPESTADIUM_SH,`ADDDATAE , USEWEBSITE',and'COMMENTS'.The `USE~ column specifies the type of usage of each venue at which point can be college athletics or professional athletics. The corresponding column to this is the ‘USE~POP’ which informs you about how many people are using each venue for a particular sport at a given time. For example if there were 6 NHL games being played that day then USE~ would say “professional Athletics” while USE~POP would state “NNN” reflecting there were NNN people spectating those events collectively: The next column is TEAM which represents what team sponsors or manages each venue or what teams will be playing in them.
Following on from TEAM is LEAGUE; here you can find out what league each team represents such as MLB, NBA etc… The next three columns CONFERENCE/DIVISION/INST ~ AFFIL provide more specific details as they blur into collegiate level as well where CONFERENCE indicates which conference they belong within their respective division: while INST ~ AFFIL states its affiliated school body e.g.: Southeastern Conference > University of Arkansas Razorbacks . Rounding up our overview these last three columns TRACK ~ TYPE/LENGTH
- Analyzing the affiliations and usage of different sports venues to determine which teams or leagues have the most presence across a certain geographic area.
- Comparing different stadiums within a given conference in terms of their roof type, track length, and stadium shape for optimal design features for new construction projects.
- Placing sponsorships or advertisements within each sporting arena based on audience size, league popularity, and team affiliation within a given conference or division
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contribut...
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset contains data and statistics for some of the greatest players who have played in the National Basketball Association (NBA). You can use these stats to assess for various aspects for these players - and maybe even find out who is the all-time GOAT of basketball.
Explaining some statistics to people unfamiliar to Basketball (Assuming points, assists etc. are obvious)
PER - Player Efficiency Rating - The player efficiency rating (PER) is John Hollinger's all-in-one basketball rating, which attempts to collect or boil down all of a player's contributions into one number. Using a detailed formula, Hollinger developed a system that rates every player's statistical performance.
EWA - Estimated Wins Added - EWA is similar to PER where it boils down all player contributions into 1 statistic. But it is used in a way to show how many wins are added to a team when that certain player plays on the court
WS & WS/48 - Win shares & Win shares per 48 - Win Share is a measure that is assigned to players based on their offense, defense, and playing time. WS/48 is win shares per 48 minutes and invented by Justin Kubatko who explains: “A win share is worth one-third of a team win. If a team wins 60 games, there are 180 'Win Shares' to distribute among the players.”
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information on players who are all time career leaders in blocks. It includes information like player name, position, teams played, number of seasons played, number of games played, birth place, birth date etc.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Individual game stats for every NBA player in the 2018-19 and 2019-20 season.
Data used to develop machine learning algorithm that determines the best fantasy basketball lineups.
Follow along in the tutorial and learn how to scrape any NBA season you choose with one Python function.
If you want to stay up to date with the tutorial and learn how to scrape data and implement ML in daily fantasy basketball, check out my blog.
Code available on my Github
Facebook
TwitterThe National Basketball Association is a professional basketball league in North America. The league is composed of 30 teams and is one of the major professional sports leagues in the United States and Canada. The Golden State Warriors and the Boston Celtics will be competing in the NBA Finals in June.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is revised based on https://www.kaggle.com/datasets/schmadam97/nba-playbyplay-data-20182019
This dataset offers a comprehensive play-by-play log of NBA games, detailing not only scoring plays but also player movements, fouls, rebounds, and other significant actions within each game.
Facebook
TwitterGame log data of the 2017-2018 NBA Regular Season.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is my take on the tiresome topic of who is the NBA GOAT. I've always had my strong opinion, but this takes a deep dive into the stats to prove it with numbers. I took a dataset of the career stats for all NBA players and organized it or narrowed it down to the top 10 players with the most points all time. I then took a look at their individual stats compared to each other. I figured to keep it easy, I only looked at the # of games played, minutes played, total points, rebounds, assists, steals, blocks and turnovers. I then took a deeper look at who most people think is the GOAT, either LeBron James or Michael Jordan. I looked at their total career stats, then per game stats, then per minute stats. But this could still be an unfair comparison because LeBron James has played roughly 500 more games than Michael Jordan did. So I did a what if scenario to show what the stats would look like if the total games and minutes played were reversed, but they each kept their individual stats averages for points, rebounds, assists, steals, blocks and turnovers. The results are impressive! Take a look, and enjoy.
I was unable to create a notebook with a sample of my work using R because I passed the time limit for the free trial after I exported the data to excel.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NBA Top Shot is just one of the many NFT platforms that is exploding right now. This website serializes highlights of NBA players and puts them on the blockchain to immortalize them forever. The community is similar to the trading card community with collectors and a marketplace to buy and sell your moments.
This dataset has all of the moments that are being offered by NBA Topshot to collect. Every card has the play type, date of event, team of the player, the lowest asking price on the moment (from the date that it was published), number of listings on the marketplace (from the date that it was published), the rarity of the moment, the number of moments minted, and whether there will be more of these moments made.
There is also some interesting data about each series of card that was released.
I scraped all of this data from 2 different website. - nbatopshot.com - evaluate.market
I used selenium to gather all of this data since both of them had javascript interfaces.
After buying into NBA Topshot, I saw the prices of cards fluctuate drastically. I wanted to understand what drove the insane prices that people were paying for a moment. Even better, I wanted to know if I could tell if a card was undervalued so that I could buy it low and sell it after it had recovered.
I still don't understand why cards are worth what they are worth, but that is what lead me to collect this data and try to see if I could make sense of it all. It could all just be hype surrounding NFT's and FOMO of not being apart of the next big thing that could be worth millions.
Hope you find this dataset interesting and I am excited to see what people do/uncover with it. :) Please let me know if there is anything else that would be helpful to include in this dataset. I am hoping to complete some exploratory analysis on it sometime shortly myself.
Facebook
TwitterThis dataset, named "state_trends.csv," contains information about different U.S. states. Let's break down the attributes and understand what each column represents:
In summary, this dataset provides a variety of information about U.S. states, including demographic data, geographical region, psychological region, personality traits, and scores related to interests or proficiencies in various fields such as data science, art, and sports.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
NBARank is an annual preseason tradition at basketball sites like ESPN and Slam. Aggrieved at some of the rankings, CJ McCollum suggested that writers get ranked instead. I set up an allourideas.com survey to do exactly this and let twitter decide who the best writers are.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Forbes is not just one of the most popular business magazines!! It contains countless articles on numerous subjects (e.g., business, investing, technology, entrepreneurship, etc.), reporting valuable data and insights.
For instance, Forbes publishes annual lists of wealthy people reporting their worth such as "Forbes 400" and "Forbes World's Billionaires list".
Athletes are not an exception, and every year lists are published for the top highest paid individuals.
The data are scrapped manually from Forbes articles listing the top 10 highest-paid athletes in tennis, NBA, and soccer.
Athletes can have multiple sources of income. • Team sports athletes earn a salary paid by their team whereas individual sports athletes compete in tournaments for prize money (such as tennis players). • Most of the time, brands are paying athletes to promote their products (on and off the court) as a marketing promotional strategy to reach a wider target audience and boost their sales / profit.
The dataset contains 11 years of data starting from 2011.
Forbes official website: https://www.forbes.com/ (dataset last updated on 3rd of January 2022)
• Which sport rewarded its athletes the most in each year? • Is there a trend across years for the total earnings of the top 10 highest-paid athletes of each sport? • Does this trend change when looking into salaries (or prize money) and endorsements separately? • Which country 'earns' the most out of those three sports each year?
These are examples of interesting questions that could be answered by analysing this dataset.
If you are interested, please have a look at the Tableau dashboard that I have created to help answer the above questions, and report some of my insights. Tableau dashboard: https://public.tableau.com/views/AthelesSaleries/SportsEarningsAnalysis?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link
Facebook
TwitterUpdate 2023-10-13: The data now includes 2022 season.
Update 2022-08-06: The data now includes 2021 season.
Update 2021-08-02: The data now includes 2020 season and metrics for 2019 have been updated.
Update 2020-08-03: The data now includes 2017, 2018 and 2019 seasons. Keep in mind that metrics like gp, pts, reb, etc. are not complete for 2019 season, as it is ongoing at the time of upload.
As a life-long fan of basketball, I always wanted to combine my enthusiasm for the sport with passion for analytics 🏀📊. So, I utilized the NBA Stats API to pull together this data set. I hope it will prove to be as interesting to work with for you as it has been for me!
The data set contains over two decades of data on each player who has been part of an NBA teams' roster. It captures demographic variables such as age, height, weight and place of birth, biographical details like the team played for, draft year and round. In addition, it has basic box score statistics such as games played, average number of points, rebounds, assists, etc.
The pull initially contained 52 rows of missing data. The gaps have been manually filled using data from Basketball Reference. I am not aware of any other data quality issues.
The data set can be used to explore how age/height/weight tendencies have changed over time due to changes in game philosophy and player development strategies. Also, it could be interesting to see how geographically diverse the NBA is and how oversees talents have influenced it. A longitudinal study on players' career arches can also be performed.