https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The goal of this project was to extract data from an NBA stats website using web scraping techniques and then perform data analysis to create visualizations using Python. The website used was "https://www.basketball-reference.com/", which contains data on players and teams in the NBA. The code for this project can be found on my GitHub repository at "https://github.com/Duggsdaddy/Srihith_I310D.git".
The data was extracted using the BeautifulSoup library in Python, and the data was stored in a Pandas DataFrame. The data was cleaned and processed to remove any unnecessary columns or rows, and the data types of the columns were checked and corrected where necessary.
The data was analyzed using various Python libraries such as Matplotlib, Seaborn, and Plotly to create visualizations like bar graphs, line graphs, and box plots. The visualizations were used to identify trends and patterns in the data.
The project follows ethical web scraping practices by not overwhelming the website with too many requests and by giving proper attribution to the website as the source of the data.
Overall, this project demonstrates how web scraping and data analysis techniques can be used to extract meaningful insights from data available on the internet.
Here's a data dictionary for the table
Player: string - name of the player Pos (Position): string - position played by the player Age: integer - age of the player as of February 1, 2023 Tm (Team): string - team the player belongs to G (Games Played): integer - number of games played by the player GS (Games Started): integer - number of games started by the player MP (Minutes Played): integer - total minutes played by the player FG (Field Goals): integer - number of field goals made by the player FGA (Field Goal Attempts): integer - number of field goal attempts by the player FG% (Field Goal Percentage): float - percentage of field goals made by the player 3P (3-Point Field Goals): integer - number of 3-point field goals made by the player 3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player 3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player 2P (2-Point Field Goals): integer - number of 2-point field goals made by the player 2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player 2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player FT (Free Throws): integer - number of free throws made by the player FTA (Free Throw Attempts): integer - number of free throw attempts by the player FT% (Free Throw Percentage): float - percentage of free throws made by the player ORB (Offensive Rebounds): integer - number of offensive rebounds by the player DRB (Defensive Rebounds): integer - number of defensive rebounds by the player TRB (Total Rebounds): integer - total rebounds by the player AST (Assists): integer - number of assists made by the player STL (Steals): integer - number of steals made by the player BLK (Blocks): integer - number of blocks made by the player TOV (Turnovers): integer - number of turnovers made by the player PF (Personal Fouls): integer - number of personal fouls made by the player PTS (Points): integer - total points scored by the player
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
This dataset contains two CSV files with information about the 2018 NBA regular season:
game_results_2018.csv - Contains results for each game played in the 2018 NBA regular season.
player_stats_2018.csv - Contains average per-game stats for every starter in the 2018 NBA season.
Data Source
The data was scraped from the official NBA website and sports reference sites that track NBA stats and results.
Data Fields
game_results_2018.csv:… See the full description on the dataset page: https://huggingface.co/datasets/Hatman/NBA-Players-Results-2018.
This data was scraped from basketball-reference.com with the intended purpose of analyzing how NBA prospect performance in the NCAA and international league play translates to the NBA. The data is not complete as it is limited to the information that was available on basketball-reference.com. For unique IDs use player name and date of birth since there have been multiple players with the same name.
You can find 3 datasets:
Thank you to basketball-reference.com for having so much great data in one interconnected site.
To bring greater understanding about the statistical relationships of draft prospect performance and future NBA performance
All information retrieved from basketball-reference.com
Rk -- Rank Pos -- Position Age -- Player's age on February 1 of the season Tm -- Team G -- Games MP -- Minutes Played PER -- Player Efficiency Rating A measure of per-minute production standardized such that the league average is 15. TS% -- True Shooting Percentage A measure of shooting efficiency that takes into account 2-point field goals, 3-point field goals, and free throws. 3PAr -- 3-Point Attempt Rate Percentage of FG Attempts from 3-Point Range FTr -- Free Throw Attempt Rate Number of FT Attempts Per FG Attempt ORB% -- Offensive Rebound Percentage An estimate of the percentage of available offensive rebounds a player grabbed while they were on the floor. DRB% -- Defensive Rebound Percentage An estimate of the percentage of available defensive rebounds a player grabbed while they were on the floor. TRB% -- Total Rebound Percentage An estimate of the percentage of available rebounds a player grabbed while they were on the floor. AST% -- Assist Percentage An estimate of the percentage of teammate field goals a player assisted while they were on the floor. STL% -- Steal Percentage An estimate of the percentage of opponent possessions that end with a steal by the player while they were on the floor. BLK% -- Block Percentage An estimate of the percentage of opponent two-point field goal attempts blocked by the player while they were on the floor. TOV% -- Turnover Percentage An estimate of turnovers committed per 100 plays. USG% -- Usage Percentage An estimate of the percentage of team plays used by a player while they were on the floor. OWS -- Offensive Win Shares An estimate of the number of wins contributed by a player due to offense. DWS -- Defensive Win Shares An estimate of the number of wins contributed by a player due to defense. WS -- Win Shares An estimate of the number of wins contributed by a player. WS/48 -- Win Shares Per 48 Minutes An estimate of the number of wins contributed by a player per 48 minutes (league average is approximately .100) OBPM -- Offensive Box Plus/Minus A box score estimate of the offensive points per 100 possessions a player contributed above a league-average player, translated to an average team. DBPM -- Defensive Box Plus/Minus A box score estimate of the defensive points per 100 possessions a player contributed above a league-average player, translated to an average team. BPM -- Box Plus/Minus A box score estimate of the points per 100 possessions a player contributed above a league-average player, translated to an average team. VORP -- Value over Replacement Player A box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player, translated to an average team and prorated to an 82-game season.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains action/event labels for the 462 analyzed sequences used in the paper. With the help of the gameIDs and times (last 8 digits of the real time entries), it can be bound to the raw SportVU data available online: https://github.com/linouk23/NBA-Player-Movements/tree/master/data/2016.NBA.Raw.SportVU.Game.Logs. (ZIP)
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
This data is obtained from basketball-reference.com using a self-written webcrawler. It contains detailed game data and player specific stats for each game of the respective season.
Data for each season is arranged in two csv-files. The first file season_XXXX_basic.csv
contains basic data for each game of the season, such as the date, time, scores and attendance. The second file season_XXXX_detailed.csv
contains additional statistics for each player participating in a specific game, such as the minutes played, field goals made and field goals attempted. A lot of data is missing for older seasons, since it wasn't recorded and is not listed on basketball-reference.com.
It would be interesting to see what statistics changed over the course of time when the game evolved and teams focused more on 3PT shots for example.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset is based on box score and standing statistics from the NBA 2016-2017 season.
Calculations such as number of possessions, floor impact counter, strength of schedule, and simple rating system are performed.
Finally, extracts are created based on a perspective:
teamBoxScore.csv communicates game data from each teams perspective
officialBoxScore.csv communicates game data from each officials perspective
playerBoxScore.csv communicates game data from each players perspective
standing.csv communicates standings data for each team every day during the season
Data Sources
Box score and standing statistics were obtained by a Java application using RESTful APIs provided by xmlstats.
Calculation Sources
Another Java application performs advanced calculations on the box score and standing data.
Formulas for these calculations were primarily obtained from these sources:
Favoritism
Does a referee impact the number of fouls made against a player or the pace of a game?
Forcasting
Can the aggregated points scored by and against a team along with their strength of schedule be used to determine their projected winning percentage for the season?
Predicting the Past
For a given game, can games played earlier in the season help determine how a team will perform?
Lots of data elements and possibilities. Let your imagination roam!
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The goal of this project was to extract data from an NBA stats website using web scraping techniques and then perform data analysis to create visualizations using Python. The website used was "https://www.basketball-reference.com/", which contains data on players and teams in the NBA. The code for this project can be found on my GitHub repository at "https://github.com/Duggsdaddy/Srihith_I310D.git".
The data was extracted using the BeautifulSoup library in Python, and the data was stored in a Pandas DataFrame. The data was cleaned and processed to remove any unnecessary columns or rows, and the data types of the columns were checked and corrected where necessary.
The data was analyzed using various Python libraries such as Matplotlib, Seaborn, and Plotly to create visualizations like bar graphs, line graphs, and box plots. The visualizations were used to identify trends and patterns in the data.
The project follows ethical web scraping practices by not overwhelming the website with too many requests and by giving proper attribution to the website as the source of the data.
Overall, this project demonstrates how web scraping and data analysis techniques can be used to extract meaningful insights from data available on the internet.
Here's a data dictionary for the table
Player: string - name of the player Pos (Position): string - position played by the player Age: integer - age of the player as of February 1, 2023 Tm (Team): string - team the player belongs to G (Games Played): integer - number of games played by the player GS (Games Started): integer - number of games started by the player MP (Minutes Played): integer - total minutes played by the player FG (Field Goals): integer - number of field goals made by the player FGA (Field Goal Attempts): integer - number of field goal attempts by the player FG% (Field Goal Percentage): float - percentage of field goals made by the player 3P (3-Point Field Goals): integer - number of 3-point field goals made by the player 3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player 3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player 2P (2-Point Field Goals): integer - number of 2-point field goals made by the player 2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player 2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player FT (Free Throws): integer - number of free throws made by the player FTA (Free Throw Attempts): integer - number of free throw attempts by the player FT% (Free Throw Percentage): float - percentage of free throws made by the player ORB (Offensive Rebounds): integer - number of offensive rebounds by the player DRB (Defensive Rebounds): integer - number of defensive rebounds by the player TRB (Total Rebounds): integer - total rebounds by the player AST (Assists): integer - number of assists made by the player STL (Steals): integer - number of steals made by the player BLK (Blocks): integer - number of blocks made by the player TOV (Turnovers): integer - number of turnovers made by the player PF (Personal Fouls): integer - number of personal fouls made by the player PTS (Points): integer - total points scored by the player