https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The goal of this project was to extract data from an NBA stats website using web scraping techniques and then perform data analysis to create visualizations using Python. The website used was "https://www.basketball-reference.com/", which contains data on players and teams in the NBA. The code for this project can be found on my GitHub repository at "https://github.com/Duggsdaddy/Srihith_I310D.git".
The data was extracted using the BeautifulSoup library in Python, and the data was stored in a Pandas DataFrame. The data was cleaned and processed to remove any unnecessary columns or rows, and the data types of the columns were checked and corrected where necessary.
The data was analyzed using various Python libraries such as Matplotlib, Seaborn, and Plotly to create visualizations like bar graphs, line graphs, and box plots. The visualizations were used to identify trends and patterns in the data.
The project follows ethical web scraping practices by not overwhelming the website with too many requests and by giving proper attribution to the website as the source of the data.
Overall, this project demonstrates how web scraping and data analysis techniques can be used to extract meaningful insights from data available on the internet.
Here's a data dictionary for the table
Player: string - name of the player Pos (Position): string - position played by the player Age: integer - age of the player as of February 1, 2023 Tm (Team): string - team the player belongs to G (Games Played): integer - number of games played by the player GS (Games Started): integer - number of games started by the player MP (Minutes Played): integer - total minutes played by the player FG (Field Goals): integer - number of field goals made by the player FGA (Field Goal Attempts): integer - number of field goal attempts by the player FG% (Field Goal Percentage): float - percentage of field goals made by the player 3P (3-Point Field Goals): integer - number of 3-point field goals made by the player 3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player 3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player 2P (2-Point Field Goals): integer - number of 2-point field goals made by the player 2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player 2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player FT (Free Throws): integer - number of free throws made by the player FTA (Free Throw Attempts): integer - number of free throw attempts by the player FT% (Free Throw Percentage): float - percentage of free throws made by the player ORB (Offensive Rebounds): integer - number of offensive rebounds by the player DRB (Defensive Rebounds): integer - number of defensive rebounds by the player TRB (Total Rebounds): integer - total rebounds by the player AST (Assists): integer - number of assists made by the player STL (Steals): integer - number of steals made by the player BLK (Blocks): integer - number of blocks made by the player TOV (Turnovers): integer - number of turnovers made by the player PF (Personal Fouls): integer - number of personal fouls made by the player PTS (Points): integer - total points scored by the player
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains end-of-season box-score aggregates for NBA players over the 2012–13 through 2023–24 seasons, split into training and test sets for both regular season and playoffs. Each CSV has one row per player per season with columns for points, rebounds, steals, turnovers, 3-pt attempts, FG attempts, plus identifiers.
end-of-season box-score aggregates (2012–13 – 2023–24) split into train/test;
the Jupyter notebook (Analysis.ipynb); All the code can be executed in there
the trained model binary (nba_model.pkl); Serialized Random Forest model artifact
Evaluation plots (LAL vs. whole‐league) for regular & playoff predictions are given as png outputs and uploaded in here
FAIR4ML metadata (fair4ml_metadata.jsonld);
see README.md and abbreviations.txt for file details.”
Notebook
Analysis.ipynb: Involves the graphica output of the trained and tested data.
Trained/ Test csv Data
Name | Description | PID |
regular_train.csv | For training purposes, the seasons 2012-2013 through 2021-2022 were selected as training purpose | 4421e56c-4cd3-4ec1-a566-a89d7ec0bced |
regular_test.csv: | For testing purpose of the regular season, the 2022-2023 season was selected | f9d84d5e-db01-4475-b7d1-80cfe9fe0e61 |
playoff_train.csv | For training purposes of the playoff season, the seasons 2012-2013 through 2022-2023 were selected | bcb3cf2b-27df-48cc-8b76-9e49254783d0 |
playoff_test.csv | For testing purpose of the playoff season, 2023-2024 season was selected | de37d568-e97f-4cb9-bc05-2e600cc97102 |
Others
abbrevations.txt: Involves the fundemental abbrevations of the columns in csv data
Additional Notes
Raw csv files are taken from Kaggle (Source: https://www.kaggle.com/datasets/shivamkumar121215/nba-stats-dataset-for-last-10-years/data)
Some preprocessing has to be done before uploading into dbrepo
Plots have also been uploaded as an output for visual purposes.
A more detailed version can be found on github (Link: https://github.com/bubaltali/nba-prediction-analysis/)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I scrapped this data from the official NBA website. I wanted to practice data visualization, EDA and correlation in data. I have attached my notebook along with the dataset for further reference.
This excel file contains individual player statistics from 2012-2022 regular seasons and playoffs.
Web-scraped from the official NBA Stats
All information retrieved from basketball-reference.com
Rk -- Rank Pos -- Position Age -- Player's age on February 1 of the season Tm -- Team G -- Games MP -- Minutes Played PER -- Player Efficiency Rating A measure of per-minute production standardized such that the league average is 15. TS% -- True Shooting Percentage A measure of shooting efficiency that takes into account 2-point field goals, 3-point field goals, and free throws. 3PAr -- 3-Point Attempt Rate Percentage of FG Attempts from 3-Point Range FTr -- Free Throw Attempt Rate Number of FT Attempts Per FG Attempt ORB% -- Offensive Rebound Percentage An estimate of the percentage of available offensive rebounds a player grabbed while they were on the floor. DRB% -- Defensive Rebound Percentage An estimate of the percentage of available defensive rebounds a player grabbed while they were on the floor. TRB% -- Total Rebound Percentage An estimate of the percentage of available rebounds a player grabbed while they were on the floor. AST% -- Assist Percentage An estimate of the percentage of teammate field goals a player assisted while they were on the floor. STL% -- Steal Percentage An estimate of the percentage of opponent possessions that end with a steal by the player while they were on the floor. BLK% -- Block Percentage An estimate of the percentage of opponent two-point field goal attempts blocked by the player while they were on the floor. TOV% -- Turnover Percentage An estimate of turnovers committed per 100 plays. USG% -- Usage Percentage An estimate of the percentage of team plays used by a player while they were on the floor. OWS -- Offensive Win Shares An estimate of the number of wins contributed by a player due to offense. DWS -- Defensive Win Shares An estimate of the number of wins contributed by a player due to defense. WS -- Win Shares An estimate of the number of wins contributed by a player. WS/48 -- Win Shares Per 48 Minutes An estimate of the number of wins contributed by a player per 48 minutes (league average is approximately .100) OBPM -- Offensive Box Plus/Minus A box score estimate of the offensive points per 100 possessions a player contributed above a league-average player, translated to an average team. DBPM -- Defensive Box Plus/Minus A box score estimate of the defensive points per 100 possessions a player contributed above a league-average player, translated to an average team. BPM -- Box Plus/Minus A box score estimate of the points per 100 possessions a player contributed above a league-average player, translated to an average team. VORP -- Value over Replacement Player A box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player, translated to an average team and prorated to an 82-game season.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
This dataset contains two CSV files with information about the 2018 NBA regular season:
game_results_2018.csv - Contains results for each game played in the 2018 NBA regular season.
player_stats_2018.csv - Contains average per-game stats for every starter in the 2018 NBA season.
Data Source
The data was scraped from the official NBA website and sports reference sites that track NBA stats and results.
Data Fields
game_results_2018.csv:… See the full description on the dataset page: https://huggingface.co/datasets/Hatman/NBA-Players-Results-2018.
As of 2024, the largest luxury tax bill footed by a team in the NBA came in the 2023/24 season, when the Golden State Warriors were taxed 176.9 million U.S. dollars by the league. The Warriors also held the other top-three spots, bringing their overall luxury tax payments from 2021/22 to 2023/24 to 510.9 million U.S. dollars.
I was having an everyday conversation with two of my friends here about how much programming knowledge we need for our college classes. One of my friends is extremely knowledgeable on basketball statistics; he can recall seemingly randomly stats for almost any college player. When we learned about his process for writing articles and making conclusions based on data, we realized that using machine learning would expedite his process almost immediately. So the first step would be to compile all the data we need.
Some of the statistics are obvious such as points, blocks, etc. However, some advanced statistics employ complicated equations, such as offensive rating and PORPAG. These statistics all need to be taken with a grain of salt, since some can be misleading. Specifically, plus/minus may seem to be an effective statistic for ranking how much of an impact players have on their team, but this can be heavily impacted by rotations.
For instance, on my favorite NBA team, the Golden State Warriors, plus/minus is almost irrelevant, since any player that is on the court with Stephen Curry almost always has a much better plus/minus than players who are forced to play without his presence on the court.
However, with the sheer bulk of stats present, I'm hoping there will be clear patterns that emerge with further digging into the data.
Avinash Chauhan and Logan Norman, who helped inspire this idea.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
🏀 NBA Shooting Stats: Synthetic Data
Este repositorio contiene un conjunto de datos sintéticos generados a partir del scraping responsable y ético de estadísticas avanzadas de equipos de la NBA, obtenidas de NBA.com/stats. El objetivo del proyecto es analizar la evolución del estilo de juego en la liga, con foco en la selección de tiro por zonas y posición, así como en la presión defensiva, medida a través de tiros defendidos y control del rebote.
📊 Descripción del dataset
El conjunto de datos cubre la evolución de los equipos de la NBA desde la temporada 1996-1997 hasta la 2024-2025, agregando estadísticas por:
Equipo
Temporada
Conferencia (East/West)
Posición del jugador (Guard, Forward, Center)
Incluye métricas ofensivas como:- Tiros intentados, anotados y % de acierto por zonas del campo (por ejemplo, <5 ft, 5–9 ft, 10–14 ft, etc.)
Y defensivas como:
Contested 2pt shots
Contested 3pt shots
Offensive boxouts (off_boxouts)
Defensive boxouts (def_boxouts)
⚙️ Generación del dataset
El scraping se realizó utilizando Seleniumy BeautifulSoup, automatizando filtros por temporada, conferencia y posición. Para garantizar buenas prácticas:
Se verificó previamente el acceso permitido mediante la librería robotparser, respetando el archivo robots.txt.
Se implementaron tiempos de espera aleatorios y navegación simulada para imitar el comportamiento humano y evitar sobrecargar los servidores.
🔐 Importante:
Los datos originales extraídos no se publican en este repositorio debido a las restricciones descritas en los Términos de uso y la Política de privacidad de NBA.com. En su lugar, se ha generado un conjunto de datos sintéticos, estadísticamente representativo pero libre de contenido propietario.
📁 Archivos incluidos
nba_synthetic_ds.csv: Dataset principal en formato CSV (delimitado por comas)
nba_synthetic_ds_excel.cs: Versión del dataset con delimitador ;
, compatible con Excel
README.md: Este documento
📌 Origen de los datos
Los datos originales fueron obtenidos desde:
https://www.nba.com/stats. Sitio oficial de estadísticas de la NBA, propiedad de © NBA Media Ventures, LLC.
El conjunto sintético aquí presentado es un trabajo derivado con fines exclusivamente académicos, que no infringe los derechos del propietario original y respeta el uso permitido especificado en los Términos y el archivo robots.txt.
📜 Licencia
Este dataset se publica bajo la licencia: 👉 CC BY-NC-SA 4.0 – Attribution-NonCommercial-ShareAlike
Esto significa que:
Puedes usar, compartir y adaptar los datos para fines no comerciales
Debes reconocer la fuente original (NBA.com) y este proyecto
Cualquier trabajo derivado debe distribuirse bajo la misma licencia
👥 Autores
Proyecto desarrollado por:
Etel Silva García – esilgar@uoc.edu
José Morote García – josemorote21@uoc.edu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blockchain data query: Overtime - NBA Playoff 24 Stats (Optimism)
The data contains stats of players in NBA in 2017-2018 season.
Thanks to basketball-reference for this data.
The National Basketball Association has one of the highest percentages of African American players from the big four professional sports leagues in North America. In 2023, approximately **** percent of NBA players were African American. Meanwhile, ethnically white players constituted a **** percent share of all NBA players that year. After the WNBA and NBA, the National Football League had the largest share of African Americans in a professional sports league in North America. How do other roles in the NBA compare? When it comes to African American representation in the NBA, no other role in the NBA is as well represented by African Americans as players. Meanwhile, on the opposite end of the scale, less than **** percent of team governors in the NBA were African American in 2023. During the 2022/23 season, the role with the second-highest share of African Americans was head coach, with a share of ** percent. That season, the number of African American head coaches in the NBA exceeded the number of white head coaches for the first time. African Americans in the NFL In 2022, the greatest share of players by ethnicity in the NFL were African American, with more than half of all NFL players falling within this group. The representation of African Americans in American Football extended beyond the playing field, with **** percent of NFL assistant coaches being African American in 2022 as well. However, positions such as vice presidents and head coaches were less representative of the African American population, as less than ** percent of the individuals fulfilling these roles in 2022 were African American.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The NBA play style is always changing, and collecting the "Team Per Game" stats can be an insight into increasing team efficiency, pace of games, etc.
The dataset contains the "Team Per Game" stats of NBA teams from 1980-2019 season.
The data recorded is that of a box score, containing: Points (PTS), Assists (AST), Steals (STL), etc.
There are 2 data sets: 1. main_df -> data is formatted as found, is not "clean" - still has asterisks in TEAM 2. playoff_labelled -> adds a "Playoff" column - indicating if the team made the playoffs for that year, and removes the asterisks in TEAM
Data source: Basketball Reference
Questions to explore: -> has the pace of the game changed? Has it increased/decreased over time? -> cluster the teams based on efficiency -> how do the teams from different eras compare?
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Basketball NBA market represents a dynamic segment of the global sports industry, characterized by a passionate fanbase, lucrative sponsorship deals, and an ever-expanding digital presence. As one of the premier professional basketball leagues worldwide, the NBA has cultivated a significant market size, boasting
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
NBA Regular Season 2024/2025 Dataset
This small data set includes every player from the regular season, his stats and a small explanation of the stats. Data includes: PLAYER_ID PLAYER_NAME NICKNAME TEAM_ID TEAM_ABBREVIATION AGE GP W L W_PCT MIN FGM FGA FG_PCT FG3M FG3A FG3_PCT FTM FTA FT_PCT OREB DREB REB AST TOV STL BLK BLKA PF PFD PTS PLUS_MINUS NBA_FANTASY_PTS DD2 TD3 WNBA_FANTASY_PTS GP_RANK W_RANK L_RANK W_PCT_RANK MIN_RANK FGM_RANK FGA_RANK FG_PCT_RANK FG3M_RANK FG3A_RANK… See the full description on the dataset page: https://huggingface.co/datasets/lieferando/nba.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The National Basketball Association (NBA) market stands as one of the premier sectors in global sports entertainment, with a current estimated market size of approximately $8 billion. This figure reflects a significant growth trajectory, bolstered by the league's strategic global expansion and engaged fan base. Hist
An average of **** million viewers tuned in to watch NBA regular season games across ABC, ESPN and TNT in the 2024/25 season. This marked a slight decline in the number of viewers from the previous season.
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
This data is obtained from basketball-reference.com using a self-written webcrawler. It contains detailed game data and player specific stats for each game of the respective season.
Data for each season is arranged in two csv-files. The first file season_XXXX_basic.csv
contains basic data for each game of the season, such as the date, time, scores and attendance. The second file season_XXXX_detailed.csv
contains additional statistics for each player participating in a specific game, such as the minutes played, field goals made and field goals attempted. A lot of data is missing for older seasons, since it wasn't recorded and is not listed on basketball-reference.com.
It would be interesting to see what statistics changed over the course of time when the game evolved and teams focused more on 3PT shots for example.
This data was scraped from basketball-reference.com with the intended purpose of analyzing how NBA prospect performance in the NCAA and international league play translates to the NBA. The data is not complete as it is limited to the information that was available on basketball-reference.com. For unique IDs use player name and date of birth since there have been multiple players with the same name.
You can find 3 datasets:
Thank you to basketball-reference.com for having so much great data in one interconnected site.
To bring greater understanding about the statistical relationships of draft prospect performance and future NBA performance
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset is based on box score and standing statistics from the NBA 2016-2017 season.
Calculations such as number of possessions, floor impact counter, strength of schedule, and simple rating system are performed.
Finally, extracts are created based on a perspective:
teamBoxScore.csv communicates game data from each teams perspective
officialBoxScore.csv communicates game data from each officials perspective
playerBoxScore.csv communicates game data from each players perspective
standing.csv communicates standings data for each team every day during the season
Data Sources
Box score and standing statistics were obtained by a Java application using RESTful APIs provided by xmlstats.
Calculation Sources
Another Java application performs advanced calculations on the box score and standing data.
Formulas for these calculations were primarily obtained from these sources:
Favoritism
Does a referee impact the number of fouls made against a player or the pace of a game?
Forcasting
Can the aggregated points scored by and against a team along with their strength of schedule be used to determine their projected winning percentage for the season?
Predicting the Past
For a given game, can games played earlier in the season help determine how a team will perform?
Lots of data elements and possibilities. Let your imagination roam!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The goal of this project was to extract data from an NBA stats website using web scraping techniques and then perform data analysis to create visualizations using Python. The website used was "https://www.basketball-reference.com/", which contains data on players and teams in the NBA. The code for this project can be found on my GitHub repository at "https://github.com/Duggsdaddy/Srihith_I310D.git".
The data was extracted using the BeautifulSoup library in Python, and the data was stored in a Pandas DataFrame. The data was cleaned and processed to remove any unnecessary columns or rows, and the data types of the columns were checked and corrected where necessary.
The data was analyzed using various Python libraries such as Matplotlib, Seaborn, and Plotly to create visualizations like bar graphs, line graphs, and box plots. The visualizations were used to identify trends and patterns in the data.
The project follows ethical web scraping practices by not overwhelming the website with too many requests and by giving proper attribution to the website as the source of the data.
Overall, this project demonstrates how web scraping and data analysis techniques can be used to extract meaningful insights from data available on the internet.
Here's a data dictionary for the table
Player: string - name of the player Pos (Position): string - position played by the player Age: integer - age of the player as of February 1, 2023 Tm (Team): string - team the player belongs to G (Games Played): integer - number of games played by the player GS (Games Started): integer - number of games started by the player MP (Minutes Played): integer - total minutes played by the player FG (Field Goals): integer - number of field goals made by the player FGA (Field Goal Attempts): integer - number of field goal attempts by the player FG% (Field Goal Percentage): float - percentage of field goals made by the player 3P (3-Point Field Goals): integer - number of 3-point field goals made by the player 3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player 3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player 2P (2-Point Field Goals): integer - number of 2-point field goals made by the player 2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player 2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player FT (Free Throws): integer - number of free throws made by the player FTA (Free Throw Attempts): integer - number of free throw attempts by the player FT% (Free Throw Percentage): float - percentage of free throws made by the player ORB (Offensive Rebounds): integer - number of offensive rebounds by the player DRB (Defensive Rebounds): integer - number of defensive rebounds by the player TRB (Total Rebounds): integer - total rebounds by the player AST (Assists): integer - number of assists made by the player STL (Steals): integer - number of steals made by the player BLK (Blocks): integer - number of blocks made by the player TOV (Turnovers): integer - number of turnovers made by the player PF (Personal Fouls): integer - number of personal fouls made by the player PTS (Points): integer - total points scored by the player