100+ datasets found
  1. MLB players on opening day rosters 2013-2024

    • statista.com
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). MLB players on opening day rosters 2013-2024 [Dataset]. https://www.statista.com/statistics/639334/major-league-baseball-players-on-opering-day-rosters/
    Explore at:
    Dataset updated
    Jun 24, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    North America
    Description

    There were a total of 949 players on opening day rosters of Major League Baseball teams ahead of the 2024 season. Of these players, almost 28 percent were from countries and territories outside the United States, with the Dominican Republic being the most represented nation.

  2. Players in the MLB in 2023, by ethnicity

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Players in the MLB in 2023, by ethnicity [Dataset]. https://www.statista.com/statistics/1310428/racial-diversity-mlb-players/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    North America
    Description

    Major League Baseball (MLB) is a professional sports league in North America made up of 30 teams that compete in the American League and the National League. In 2023, just over ** percent of players within the league were Hispanic or Latino.

  3. Regular season home attendance of Major League Baseball teams 2024

    • statista.com
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Gough (2025). Regular season home attendance of Major League Baseball teams 2024 [Dataset]. https://www.statista.com/topics/968/major-league-baseball/
    Explore at:
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Christina Gough
    Description

    This graph depicts the total regular season home attendance of all Major League Baseball teams in the 2024 season. The Los Angeles Dodgers took the top spot with a total season home attendance of 3.94 million people. Conversely, the team with the lowest total season home attendance was the Oakland Athletics, who registered an attendance of around 0.92 million fans in 2024.

  4. MLB Batting Data (2015-2024)

    • kaggle.com
    zip
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josue FernandezC (2025). MLB Batting Data (2015-2024) [Dataset]. https://www.kaggle.com/datasets/josuefernandezc/mlb-hitting-data-2015-2024
    Explore at:
    zip(272240 bytes)Available download formats
    Dataset updated
    Sep 29, 2025
    Authors
    Josue FernandezC
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MLB Batting Stats (2015–2024)

    📝Description

    This dataset contains scraped Major League Baseball (MLB) batting statistics from Baseball Reference for the seasons 2015 through 2024. It was collected using a custom Python scraping script and then cleaned and processed in SQL for use in analytics and machine learning workflows.

    The data provides a rich view of offensive player performance across a decade of MLB history. Each row represents a player’s season, with key batting metrics such as Batting Average (BA), On-Base Percentage (OBP), Slugging (SLG), OPS, RBI, and Games Played (G). This dataset is ideal for sports analytics, predictive modeling, and trend analysis.

    ⚙️Data Collection (Python)

    Data was scraped directly from Baseball Reference using a Python script that:

    • Sent HTTP requests with browser-like headers to avoid request blocking.
    • Parsed HTML tables with pandas.read_html().
    • Added a Year column for each season.
    • Cleaned player names by removing symbols (#, *).
    • Kept summary rows for players who appeared on multiple teams/leagues.
    • Converted numeric fields and filled missing values with zeros.
    • Exported both raw and cleaned CSVs for each year.

    🧹Data Cleaning (SQL)

    • After scraping, the raw batting tables were uploaded into BigQuery and further cleaned:
    • Null values removed – Rows missing key fields (Player, BA, OBP, SLG, OPS, Pos) were excluded.
    • Duplicate records handled – Identified duplicate player–year–league entries and kept only one instance.
    • Minimum playing threshold applied – Players with fewer than 100 at-bats were removed to focus on meaningful season-long contributions.
    • The final cleaned table (cleaned_batting_stats) provides consistent, duplicate-free player summaries suitable for analytics.

    📊Dataset Structure

    Columns include: - Player – Name of the player - Year – Season year - Age – Age during the season - Team – Team code (2TM for multiple teams) - Lg – League (AL, NL, or 2LG) - G – Games played - AB, H, 2B, 3B, HR, RBI – Core batting stats - BA, OBP, SLG, OPS – Rate statistics - Pos – Primary fielding position

    🚀Potential Uses

    • League Trends: Compare batting averages and OPS across seasons.
    • Top Performer Analysis: Identify the best hitters in different eras.
    • Predictive Modeling: Forecast future player stats using regression or ML.
    • Clustering: Group players into offensive archetypes.# ## ## ##
    • Sports Dashboards: Build interactive Tableau/Plotly dashboards for fans and analysts.

    📌Acknowledgments

    Raw data sourced from Baseball Reference .

    Inspired by open baseball datasets and community-driven sports analytics.

  5. MLB interest level in the U.S. 2023, by ethnicity

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, MLB interest level in the U.S. 2023, by ethnicity [Dataset]. https://www.statista.com/statistics/1100127/interest-level-baseball-ethnicity/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 17, 2023 - Mar 19, 2023
    Area covered
    United States
    Description

    Major League Baseball is one of the most popular professional sports leagues in North America. The survey depicts the level of interest in the MLB in the United States and it showed that 36 percent of Hispanic respondents were avid fans of the league.

  6. Major League Baseball's Most Cost-Effective

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Major League Baseball's Most Cost-Effective [Dataset]. https://www.kaggle.com/datasets/thedevastator/major-league-baseball-s-most-cost-effective-play/suggestions
    Explore at:
    zip(757938 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    Description

    Major League Baseball's Most Cost-Effective Players of 2019

    Hitting, Pitching, and Overall Statistics

    By Andy Kriebel [source]

    About this dataset

    About this dataset

    This dataset contains MLB hitting statistics for the 2013 season. The original source of the data is Lahman’s Baseball Database. The original visualization can be found here.

    This dataset is interesting because it allows us to see which players were the most cost effective in terms of salary and production. For example, we can see that Miguel Cabrera was the highest paid player in 2013, but he was also one of the most productive hitters in terms of runs batted in (RBIs). On the other hand, we can see that players like Mike Trout and Clayton Kershaw were among the league leaders in production but they were not among the highest paid players.

    There are a number of ways to measure a player's cost effectiveness, but one simple method is to compare their salary to their production (measured by runs created, or RC). Players who create a lot of runs while being paid relatively little are more cost effective than players who are paid more but produce less. By this metric, some of the most cost effective players in 2013 were Delmon Young, Wilson Ramos, and Shane Victorino

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • Your notebook can be here!

    How to use the dataset

    https://www.kaggle.com/andrewmvd/most-cost-effective-players-of-2019

    How to Use This Dataset

    This dataset consists of Major League Baseball's most cost effective players of 2019, as measured by WAR per dollar of salary (wWAR/$). WAR is a metric that attempts to measure a player's overall contributions to their team, and includes both offense and defense. You can read more about it here. The dataset includes each player's name, position, team, salary, and wWAR/$.

    To use this dataset, you may want to consider the following questions: * Who are the most cost effective players in baseball? * What positions do these players tend to play? * Which teams have the most cost effective players?

    Research Ideas

    • finding the most cost-effective baseball players
    • comparing different salary structures among teams
    • improving player performance through analytics

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: MLB Stats.csv | Column name | Description | |:----------------|:---------------------------------------------------------------| | Player Name | The player's name. (String) | | weight | The player's weight in pounds. (Numeric) | | height | The player's height in inches. (Numeric) | | bats | The player's batting handedness. (String) | | throws | The player's throwing handedness. (String) | | Season | The season in which the statistics were accrued. (String) | | League | The league in which the player played. (String) | | Team | The team for which the player played. (String) | | Franchise | The franchise to which the team belongs. (String) | | G | The number of games the player played. (Numeric) | | AB | The number of at-bats the player had. (Numeric) | | R | The number of runs the player scored. (Numeric) | | H | The number of hits the player had. (Numeric) | | 2B | The number of doubles the player hit. (Numeric) ...

  7. i

    Grant Giving Statistics for Major League Baseball Youth Foundation

    • instrumentl.com
    Updated Mar 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Grant Giving Statistics for Major League Baseball Youth Foundation [Dataset]. https://www.instrumentl.com/990-report/major-league-baseball-youth-foundation
    Explore at:
    Dataset updated
    Mar 1, 2021
    Variables measured
    Total Assets, Total Giving, Average Grant Amount
    Description

    Financial overview and grant giving statistics of Major League Baseball Youth Foundation

  8. w

    Global Baseball League Market Research Report: By League Type (Professional,...

    • wiseguyreports.com
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Baseball League Market Research Report: By League Type (Professional, Amateur, Semipro), By Player Demographics (Age, Gender, Skill Level), By Fan Engagement (In-Person Attendance, Broadcast Viewership, Online Streaming), By Revenue Stream (Ticket Sales, Merchandising, Sponsorship) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/baseball-league-market
    Explore at:
    Dataset updated
    Aug 19, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202413.1(USD Billion)
    MARKET SIZE 202513.5(USD Billion)
    MARKET SIZE 203518.4(USD Billion)
    SEGMENTS COVEREDLeague Type, Player Demographics, Fan Engagement, Revenue Stream, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSincreased fan engagement, rising sponsorship deals, expanding youth participation, technological advancements, global broadcasting rights
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDBaseball Factory, Adidas, Nike, Mizuno, Marucci Sports, Under Armour, Major League Baseball, ProMounds, Axe Bat, Easton Sports, Franklin Sports, Wilson Sporting Goods, Dudley Sports, Louisville Slugger, Rawlings Sporting Goods
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESYouth baseball programs expansion, Digital streaming partnerships, Global fan engagement initiatives, Enhanced sports merchandise sales, Advanced analytics integration.
    COMPOUND ANNUAL GROWTH RATE (CAGR) 3.2% (2025 - 2035)
  9. Average total home attendance per team in Major League Baseball 2024

    • statista.com
    Updated Aug 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Gough (2025). Average total home attendance per team in Major League Baseball 2024 [Dataset]. https://www.statista.com/topics/968/major-league-baseball/
    Explore at:
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Christina Gough
    Description

    The average total regular season home attendance per team in Major League Baseball remained relatively stable from 2005 to 2024 with the notable exception of a considerable reduction in 2021 as a result of the coronavirus (COVID-19) containment measures. In 2024, the average total home attendance was 2.38 million.

  10. ⚾ Major League Baseball Hitting ⚾

    • kaggle.com
    zip
    Updated Oct 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane Simon (2023). ⚾ Major League Baseball Hitting ⚾ [Dataset]. https://www.kaggle.com/datasets/m000sey/major-league-baseball-hitting-data
    Explore at:
    zip(99765 bytes)Available download formats
    Dataset updated
    Oct 14, 2023
    Authors
    Shane Simon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Who doesn't love over-analyzing baseball data? Which hitters performed the best? What's the distribution of 'OBP' this year? What hitters over-performed relative to their StatCast underlying data? Let's dig in.

    This year, the MLB instituted a bunch of interesting baseball rules, including bigger bases, a pitch clock, and limited shifts, among others. This undoubtedly changed the offensive environment...

    I got the raw data from www.fangraphs.com and then cleaned it up for everyone to analyze. Happy EDA and let me know if you find any cool trends. Note: I only listed qualified hitters with at least 100 plate appearances.

    I want to add position and sidedness to the datasheet for each hitter. Stay tuned

    Feature descriptions: Name - hitter's name Team - hitter's team (or last team they were on) G = games played AB = # of at bats PA = plate appearances H = hits 1B = singles 2B = doubles 3B = triples HR = home runs R = runs scored RBI = runs batted in BB = bases on balls IBB = intentional bases on balls SO = strike outs HBP = hit by pitch SF = sacrifice fly SH = sacrifice hit GDP =ground into a double play SB = stolen base CS = caught stealing AVG = batting average BB% = BB / PA K% = SO/PA BB/K = BB/SO OBP = on base percentage SLG = slugging percentage OPS = OBP + SLG ISO = SLG - AVG Spd = running speed score BABIP = AVG on balls in play UBR = ultimate base running in runs above average wGDP = weighted ground into double play runs above average wSB = SB and CS runs above average wRC = weighted runs created based on wOBA wRAA = weighted runs above averaged based on wOBA wOBA = weight on base percentage average wRC+ = rwRC plus, whereby additional factors are taken into consideration like ball park or era GB/FB = ground ball to fly ball ratio LD% = line drive % (LB / balls in play) GB% = ground ball % (GB/ balls in play) Flyball% = flyball% also commonly known as FB% (Flyball/ balls in play) IFFB% = infield flyball % (in field flyball / flyballs) HR/FB = home run / Flyball IFH = infield hits IFH% = IFH / GB BUH = bunt hits BUH% = BUH / bunts Pull% = % of balls that were pulled by hitter Oppo% = % of balls that were pushed by hitter Cent% = % of balls that were hit to CF by hitter Soft% = % of balls hit in play that were classified as hit with soft speed Med% = % of balls hit in play that were classified as hit with medium speed Hard% = % of balls hit in play that were classified as hit with hard speed Batted ball = PA - SO - BB - HBP EV = average exit velocity of Batted ball maxEV = maximum exit velocity of Batted ball LA = Launch angle Barrels = a batted ball with an exit velocity of at least 98mph and LA between 26-30 degrees. For EV mph over 98 degrees, the LA range gets higher by 1 degree Barrel% = % of Batted balls that are classified as barrels HardHit = # of Batted balls with an EV of 95 or higher HardHit% = % of Battled balls with an EV of 95 or higher xBA = expected batting average xSLG = expected slugging percentage xwOBA = expected weighted on base average Clutch = (Win Probability Added / a hitter's Leverage index for all game events) - (Win Probability Added / Leverage index), which essentially measures how much better a player does in a high leverage situation compared to a neutral situation. O-Swing% = % of pitches a batter swings at outside of the strike zone Z-Swing% = % of pitches a batter swings at inside of the strike zone Swing% = % of total pitches a batter swings at O-Contact% = % of times a batter makes contact with the ball when swinging at pitches thrown outside of the zone Z-Contact% = % of times a batter makes contact with the ball when swining at pitches thrown inside of the zone Contact% = total percentage of contact made when swinging at all pitches Zone% = % of pitches seen inside the strike zone F-Strike% = First pitch strike percentage SwStr% = Swinging strike % CStr% = Called strike % CSW% = SwStr% + CStr% wFB = How well does the batter do vs fastballs? Using pitch types linear weights wSL = How well does the batter do vs sliders? Using pitch types linear weights wCT = How well does the batter do vs cutters? Using pitch types linear weights wCB = How well does the batter do vs curves? Using pitch types linear weights wCH = How well does the batter do vs change-ups? Using pitch types linear weights wSB = How well does the batter do vs splitters? Using pitch types linear weights wFB/C = How well does the batter do vs fastballs per 100 pitches? wSL = How well does the batter do vs sliders per 100 pitches? wCT = How ...

  11. 🏟️ Negro League Database

    • kaggle.com
    zip
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🏟️ Negro League Database [Dataset]. https://www.kaggle.com/datasets/mexwell/negro-league-database
    Explore at:
    zip(16198067 bytes)Available download formats
    Dataset updated
    Oct 8, 2024
    Authors
    mexwell
    Description

    About

    The Negro leagues were United States professional baseball leagues comprising teams of African Americans. The term may be used broadly to include professional black teams outside the leagues and it may be used narrowly for the seven relatively successful leagues beginning in 1920 that are sometimes termed "Negro Major Leagues".

    To date, Retrosheet has compiled data on 6,116 Negro League games which were played in 337 different ballparks in 259 cities across 33 states, the District of Columbia, and two foreign countries (Mexico and Canada). We have compiled at least some statistics for 2,759 players who participated in one or more of these games. These games include not only regular-season Negro League games, but also all-star games, playoff games, and exhibition games between major-league caliber teams. This latter set includes several hundred games played between White and Black major-league baseball players (so those 2,759 players include players such as Dizzy Dean, Bob Feller, Lefty Grove, Babe Ruth, and Ted Williams, among others).

    The centerpiece of Negro League data are a set of .csv files which summarize game-level data for all (5,255) Negro League games for which Retrosheet has compiled data. There are five such .csv files.

    gameinfo.csv - contains game-level information such as teams, attendance, umpires, etc. teamstats.csv - contains team-level statistics - line scores, lineups, and team statistics (batting, pitching, fielding) batting.csv - batting statistics by player by game pitching.csv - pitching statistics by player by game fielding.csv - fielding statistics by player by position by game

    The columns are labeled and should be mostly self-explanatory. But, in case not, the columns are defined in the document context.txt which is included in the zip file.

    The level of detail at which Negro League data can be determined is highly variable across games and the data "known" is highly uncertain in many cases. For example, for many games, we have no box score but may have a reference to the fact that a particular player had at least one hit in the game. To attempt to convey this uncertainty in our data, teams and players may be given up to three sets of statistical lines for each game within the data files which are available for download. These are identified within the .csv files by the variable 'stattype'.

    • stattype 'value' is Retrosheet's best estimate of the relevant statistical total
    • stattype 'lower' is the lower bound on a player's total
    • stattype 'upper' is the upper bound on a player's total

    All teams players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added. As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).

    In addition to these five files which aggregate all Negro League games, we also have compiled separate logs by team (subsets of teamstats.csv divided by team-season), by ballpark (subsets of gameinfo.csv) and by player (subsets of batting.csv, pitching.csv, and fielding.csv). For ballparks and players, these aggregate across all seasons.

    In addition to these .csvs, Retrosheet has also compiled event files (.evx files) and box-score files (.ebx files) for games for which sufficient data is available. Games are compiled into a single file for each season for which we have compiled games of the relevant type. In the former case, event files are included both for games for which we have found play-by-play accounts as well as games which have been deduced. The latter are identified within the files via a comment at the start of the play-by-play portion of the file.

    Finally, the zip file here includes roster files for all teams for whom Retrosheet has compiled rosters as well as our master files for people (biofile.csv), ballparks (ballparks.csv), and teams (teams.csv). These files include data for all people, teams, and sites across all Retrosheet games, not just Negro League games.

    Read more about the dataset here.

    Notice

    The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties ma...

  12. H

    Data from: Simplicity Versus WAR: Examining Salary Determinations in Major...

    • dataverse.harvard.edu
    Updated Sep 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joshua Studnitzer (2019). Simplicity Versus WAR: Examining Salary Determinations in Major League Baseball's Arbitration and Free Agent Markets [Dataset]. http://doi.org/10.7910/DVN/28782
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Joshua Studnitzer
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/28782https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/28782

    Description

    This paper examines salaries given to arbitration eligible players in Major League Baseball from 2008-2013 and compares them to free agent contracts from the same period. Anecdotal evidence suggests that simpler statistics are more successful in Major League Baseball's final offer arbitration setting as legal experts tasked with handling the league's cases may not have a deep knowledge of player valuation. I examine the effects of wins above replacement, a complex but comprehensive metric, and traditional statistics, such as runs batted in, on salaries decided in both settings. Wins above replacement is significant in each case, but with a much higher coefficient in free agency suggesting a greater impact. There is no evidence of individual traditional statistics being especially significant in arbitration; I attribute this to parties framing their offers with whichever statistics portray them in the most favorable light. Finally, I look to statistics in the season following contracts to determine if either market is more effective in getting value at a low cost, but results are similar in each case and limitations with the data restrict the efficacy of conclusions in the section.

  13. Basic statistics for the MLB and NBA Twitter networks using mathematica.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily J. Evans; Rebecca Jones; Joseph Leung; Benjamin Z. Webb (2023). Basic statistics for the MLB and NBA Twitter networks using mathematica. [Dataset]. http://doi.org/10.1371/journal.pone.0268619.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Emily J. Evans; Rebecca Jones; Joseph Leung; Benjamin Z. Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Basic statistics for the MLB and NBA Twitter networks using mathematica.

  14. i

    Grant Giving Statistics for Major League Baseball Equipment Managers...

    • instrumentl.com
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Grant Giving Statistics for Major League Baseball Equipment Managers Association [Dataset]. https://www.instrumentl.com/990-report/major-league-baseball-clubhouse-managers-association
    Explore at:
    Dataset updated
    Feb 13, 2022
    Variables measured
    Total Assets, Total Giving, Average Grant Amount
    Description

    Financial overview and grant giving statistics of Major League Baseball Equipment Managers Association

  15. The Most Cost-Effective MLB Hitters

    • kaggle.com
    zip
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). The Most Cost-Effective MLB Hitters [Dataset]. https://www.kaggle.com/thedevastator/uncovering-the-most-cost-effective-mlb-hitters-o
    Explore at:
    zip(757938 bytes)Available download formats
    Dataset updated
    Dec 4, 2022
    Authors
    The Devastator
    Description

    The Most Cost-Effective MLB Hitters

    Analyzing Performance and Salary Impact

    By Andy Kriebel [source]

    About this dataset

    This 2013 Major League Baseball hitting statistics dataset compiles the data from Lahman’s Baseball Database and includes salary, team and a variety of other stats for each player. The data covers all levels from amateur to professional, and provides a wealth of information about the past year's performance in baseball. With this dataset, you can analyze batting averages for home runs, RBIs, stolen bases and more—as well as average salaries across players. It is an invaluable resource for anyone looking to get insight into the very best in baseball performance over the last year. Whether you're an avid fantasy league enthusiast or just curious about major league stats this statistic set is sure to help you see who was making waves on or off the field in 2013!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This kaggle dataset consists of all the 2013 Major League Baseballl (MLB) hitting statistics for each player, including their salary, team, and other stats. The main objective of this dataset is to uncover the most cost effective MLB hitters of 2013 by analyzing their stats in relation to how much they are paid. This data can be used by baseball fans looking to gain insights into the performance and salaries of MLB players in 2013 as well as fantasy baseball owners trying to identify value-for-money players for their teams.

    In order to make use of this dataset, you will need some knowledge on commonly used baseball stats like runs batted in (RBI), runs scored (R), batting average (AVG), on base percentage (OBP) etc. These stats provide information on players' offensive contributions to the game while fielding and pitching statistics will not be included in this specific dataset. You can then analyse these individual player statistics in comparison with each other or against league averages or trends across various franchises and different leagues such as American League or National League teams over a range of seasons such as 2009 - 2019 season.

    Some interesting analysis that you could draw from this data include correlations between higher salaries and a number home runs hit per season, exploring whether there is any truth behind ‘big-hitting’ superstars being paid more than consistent players playing important roles but do not hit many homeruns; cross-referencing which Franchises have more cost effective hitters versus what type/ style of play; identifying if there has been any changes based on handedness i.e left / right handed batters performance & salary; etc… There is certainly much potential with this interesting set available!

    Research Ideas

    • Creating an interactive visualization allowing users to see the top 10 most cost-effective MLB batters of 2013 based on a number of criteria such as salary, batting stats, or games played.
    • Comparing how teams’ payrolls shifted after particular seasons and seeing how budget changes affected different player groups (e.g., high-salary vs low-salary players).
    • Utilizing this data to develop a predictive model for estimating future salaries for current MLB players by analyzing the historical performance of other similar players in correlation with their salaries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: MLB Stats.csv | Column name | Description | |:----------------|:------------------------------------------------------------------------------------| | Player Name | Name of the player. (String) | | weight | Weight of the player in pounds. (Integer) | | height ...

  16. Major League Baseball Game Logs

    • kaggle.com
    zip
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Major League Baseball Game Logs [Dataset]. https://www.kaggle.com/thedevastator/major-league-baseball-game-logs
    Explore at:
    zip(24569430 bytes)Available download formats
    Dataset updated
    Dec 20, 2023
    Authors
    The Devastator
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Major League Baseball Game Logs

    Historical MLB Game Logs and Player Statistics from 1871-2016

    By Dataquest [source]

    About this dataset

    This comprehensive dataset provides a historical record of Major League Baseball (MLB) games dating back to its inception. It offers an in-depth look into the game's significant aspects, encompassing detailed statistics, player performance information, and game outcomes across multiple seasons.

    The MLB Game Logs dataset is a rich depository of data provided in the form of structured records. Sourced from Retrosheet, this dataset was initially presented in 127 distinct CSV files which have now been amalgamated into a single consolidated file for facilitating seamless analysis.

    Starting from the fundamental game statistics like date and venue of matches, team names and IDs to capturing minute attributes such as day or night match distinction or completion info; all pertinent details are captured meticulously in this voluminous repository. More granular inputs like lengths of games measured via outs or attendance figures lend further richness to this set.

    From a player performance perspective too the set is equally exhaustive housing data on hits home runs stealing bases sacrificing ventures extra-base hits runs batted in (RBIs), winning pitchers losing pitchers saving pitchers all listed alongside their respective players IDs for easy cross-referencing.

    In addition to providing raw data,this dataset carries greatly-detailed column names grounded upon Retrosheets field explanations to proffer better clarity around each field contained within it's ambit thus ensuring users derive maximum value with minimal misinterpretation issues.

    While comprehensive explanations about columns have been included within the data dictionary part of our files ,we recommend referring directly towards Retrosheet field explanation for complete details surround specific fields if so required.

    As part and parcel respect for copyright terms belonging towards Retrosheet we declare :

    The information used here was obtained free of charge from and is copyrighted by Retrosheet.
    Interested parties may contact Retrosheet at www.retrosheet.org.
    
    

    We hope that enthusiasts, researchers, statisticians and other users find value in this rich resource of baseball history

    How to use the dataset

    This dataset is a treasure trove of history and statistics, offering detailed player and game information for MLB games from 1871 to 2016. This 'how-to-use' guide will thus be helpful for beginners or others who are not familiar with the dataset format.

    • Understand columns: Familiarize yourself with the numerous columns in this data set. Each column offers distinct information about each game, such as player performance, location of the match, number of spectators and more. It's okay if you do not understand everything right away.

    • Read documentation: You'll find a 'Retrosheet field explanation' link in the description provided above which explains each column in detail. Do make sure to go through it to get a better understanding.

    • Define your objective: Are you looking at predicting future game outcomes? Or trying to find patterns between attendance and team performance? Defined objectives will help focusing on relevant columns greatly reducing needless exploration efforts.

    • Cleaning Data: A few data points might have missing values or illegible entries; identifying them could help provide accurate insights from analysis.

    • Perform initial EDA (Exploratory Data Analysis): EDA is an approach that includes inspecting, cleaning, transforming, and visualizing raw datasets to inform our understanding of their underlying structure that might inform our selection or creation of statistical models later on down the line:

      • Histograms: Could provide frequency distributions for numeric variables.

      • Box plots: A good way of quickly visualizing where most data points lie.

      • Pivot tables: Aggregating specific groups can give you comprehensive insights into large sets.

    • Statistical Analysis & Machine Learning Models: With clear objectives & prepared dataset at hand trying various machine learning models for prediction like Logistic Regression model for binary outcome prediction (win/lose), Multiple Linear Regression model when outcome variable is numerical (score), decision trees for data segmentation and so on.

    • Visualize: It always help to build charts, graphs or tables for final insights visualization for others.

    This datas...

  17. 🏆 Baseball Hall of Fame Position Players

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kris Bruurs (2025). 🏆 Baseball Hall of Fame Position Players [Dataset]. https://www.kaggle.com/datasets/krisbruurs/baseball-hall-of-fame-position-players
    Explore at:
    zip(40297 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Kris Bruurs
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description:

    This dataset features 212 Major League Baseball (MLB) Hall of Fame position players, excluding pitchers. It includes key career statistics such as batting average (BA), home runs (HR), RBI, stolen bases (SB), WAR, OPS, and OPS+, as well as career timeline information (debut year, last year, and total active years). This allows for historical comparisons of player performance across different baseball eras.

    Columns:

    • Name – Player's full name
    • Debut_Year – The year the player made their MLB debut
    • Last_Year – The final year the player played in MLB
    • Active_Years – Total number of years the player was active (Last_Year - Debut_Year + 1)
    • Positions – Primary positions played
    • Bats/Throws – Batting and throwing hand
    • Height/Weight – Physical attributes
    • WAR (Wins Above Replacement) – Overall player value
    • AB (At-Bats), H (Hits), HR (Home Runs), RBI (Runs Batted In), R (Runs) – Traditional batting stats
    • BA (Batting Average), OBP (On-Base Percentage), SLG (Slugging Percentage), OPS (On-base + Slugging), OPS+ – Performance metrics

    Use Cases:

    • Era Comparisons – Analyze how Hall of Fame players’ stats have changed over time.
    • Career Longevity Analysis – Study how long Hall of Famers played and whether longer careers correlated with higher performance.
    • Position-Based Trends – Compare performance across different positions (e.g., outfielders vs. shortstops).
    • WAR & OPS+ Analysis – Examine how advanced metrics evolved over different baseball generations.

    This dataset is perfect for baseball analysts, sports data scientists, and fans looking to explore Hall of Fame career statistics over time.

  18. u

    Twitter data for "Remapping and visualizing baseball labor"

    • iro.uiowa.edu
    zip
    Updated Dec 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Walden (2017). Twitter data for "Remapping and visualizing baseball labor" [Dataset]. https://iro.uiowa.edu/esploro/outputs/dataset/Twitter-data-for-Remapping-and-visualizing/9983736668802771
    Explore at:
    zip(470983 bytes)Available download formats
    Dataset updated
    Dec 13, 2017
    Dataset provided by
    University of Iowa
    Authors
    Katherine Walden
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Time period covered
    2019
    Description

    Recent baseball scholarship has drawn attention to U.S. professional baseball’s complex twentieth century labor dynamics and expanding global presence. From debates around desegregation to discussions about the sport’s increasingly multicultural identity and global presence, the cultural politics of U.S. professional baseball is connected to the problem of baseball labor. However, most scholars address these topics by focusing on Major League Baseball (MLB), ignoring other teams and leagues—Minor League Baseball (MiLB)—that develop players for Major League teams. Considering Minor League Baseball is critical to understanding the professional game in the United States, since players who populate Major League rosters constitute a fraction of U.S. professional baseball’s entire labor force. As a digital humanities dissertation on baseball labor and globalization, this project uses digital humanities approaches and tools to analyze and visualize a quantitative data set, exploring how Minor League Baseball relates to and complicates MLB-dominated narratives around globalization and diversity in U.S. professional baseball labor. This project addresses how MiLB demographics and global dimensions shifted over time, as well as how the timeline and movement of foreign-born players through the Minor Leagues differs from their U.S.-born counterparts. This project emphasizes the centrality and necessity of including MiLB data in studies of baseball’s labor and ideological significance or cultural meaning, making that argument by drawing on data analysis, visualization, and mapping to address how MiLB labor complicates or supplements existing understandings of the relationship between U.S. professional baseball’s global reach and “national pastime” claims.

  19. Average age of players in Major League Baseball by club 2023

    • statista.com
    Updated Sep 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Average age of players in Major League Baseball by club 2023 [Dataset]. https://www.statista.com/statistics/236223/major-league-baseball-clubs-by-average-age-of-players/
    Explore at:
    Dataset updated
    Sep 12, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, the average of players within each MLB team was between around 26-30 years old. This is considered to be the prime of a player's career, as they are typically at their peak physical and athletic ability at this age.

    Who is the oldest player in the MLB?

    In 2023, the average age of the players on the New York Yankees' roster was 28.3 years. Out of all the teams in MLB, the Los Angeles Dodgers had the highest average player age. In the same year, the Toronto Blue Jays' average player age was 29.6 years.

    What is Major League Baseball? Major League Baseball (MLB) is the highest level of professional baseball in the United States and Canada. It comprises 30 teams, 29 of which are located in the United States and one in Canada. The teams are divided into two leagues: the American League (AL) and the National League (NL), and each league is further divided into three divisions: East, Central, and West. The teams play a 162-game regular season schedule, with the goal of earning a spot in the postseason, which consists of the AL and NL Championship Series, and the World Series. The team that wins the World Series is declared the champion of the MLB.

    Fans watch at home and live in the stadiums There are many ways to enjoy MLB games, whether you are a die-hard fan, a casual viewer, or a player yourself. You can watch games on TV, or stream them live online. In 2022, the average TV viewership of MLB World Series games stood at 11.8 million. Additionally, many teams have their own websites, social media accounts, and mobile apps that allow fans to stay up-to-date with the latest news, scores, and player stats. It is also possible to purchase tickets to games and watch the action live at the stadium. In 2022, the average attendance at the games in the MLB was 26,808.

  20. 🧢 MLB Home Plate Umpires

    • kaggle.com
    zip
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🧢 MLB Home Plate Umpires [Dataset]. https://www.kaggle.com/datasets/mexwell/mlb-home-plate-umpires
    Explore at:
    zip(8363 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    mexwell
    Description

    Motivation

    Baseball is a popular American sport played on a diamond-shaped field. Games are 9 innings long and each inning has two halves, the first in which the visiting team bats and the second where the home team bats. Innings end after three outs. An out is when a player from the hitting team is removed from play for the half of the inning due to various reasons. Batters aim to get on base by hitting a ball pitched to them by the pitcher. Batters can get to first, second, or third base depending on how far they hit the ball and how fast they run. If a batter hits the ball past the outfield fences, they, along with any runners on base, automatically score, this is called a home run. Runners can also score if another player hits the ball and then they reach home. The team with the most runs wins the game.

    There are 9 defensive positions in baseball, the layout of these positions is labeled in the below diagram.

    https://data.scorenetwork.org/_prep/mlb_umpires_2008-2023/images/images.png" alt="">

    SOURCE: https://en.wikipedia.org/wiki/Baseball_positions

    Behind the catcher, at home plate is an official known as the home plate umpire. The umpire’s role is to enforce the rules and make decisions during a game. Many of these decisions involve calling balls and strikes. Pitches that are considered strikes are pitched within the zone outlined below. Anything outside of that zone is called a ball. If a batter gets 3 strikes, they are out on a strike out. If the batter gets 4 balls they get to go to first base on what is called a walk.

    https://data.scorenetwork.org/_prep/mlb_umpires_2008-2023/images/5bd08351ae57fd50e3c91538_Dimensions-Guide-Sports-Baseball-Strike-Zone-Dimensions.svg" alt="">

    SOURCE: https://www.dimensions.com/element/strike-zone Major League Baseball (MLB) is a professional baseball league with 30 teams and a 162 game season. The MLB has 76 umpires in total with four umpires in each game. Umpires are stationed at 1st, 2nd, and 3rd base in addition to home plate but the home plate umpire is the only one who makes calls on pitches.

    The mlb_umpires.csv dataset looks at cumulative data from MLB homeplate umpires dating as far back as 2008. The boost statistics in the dataset investigate how certain umpires compare to the “average” Major League Baseball umpire. The dataset provides insight on if umpires favor defensive players or offensive plaeyrs more.

    Data

    The data set has 954 rows with 11 variables. Each row is an MLB home plate umpire combined with a boost_stat ranking how they compare with the average umpire. There are 159 umpires in the dataset with 6 rows per umpire. The data is cumulative from 2008 until 2024.

    Variable Description

    • Umpire The name of the umpire.
    • Games The number of games the umpire has umpired since 2008.
    • k_pct The strike out percentage of batters and pitchers when the umpire is umpiring. (Career Strike Outs Called/Career Plate Appearances Umpired)
    • bb_pct The walk percentage of batters when the umpire is umpiring. (Career Walks Called/Career Plate Appearances Umpired)
    • RPG The career runs scored per game when the umpire is umpiring. (Career Runs While Umpiring/Career Games Umpired)
    • BA The batting average of batters in games when the umpire is umpiring. (Career Hits While Umpiring/Career Plate Appearances Umpired)
    • OBP The on base percentage of batters when the umpire is umpiring. ((Career Hits While Umpiring + Career Walks While Umpiring + Career Hit by Pitches While Umpiring)/(Career At Bats Umpired + Career Walks While Umpired + Career Hit by Pitches While Umpiring + Career Sacrifice Flies While Umpiring))
    • SLG The slugging percentage of batters when the umpire is umpiring. ((Singles While Umpiring + (Doubles While Umpiring * 2) + (Triples While Umpiring * 3) + (Home Runs While Umpiring * 4))/Career At Bats Umpired)
    • boost_stat The statistic being “boosted” by the umpire when they are behind home plate. This can be strikeouts (K), walks (BB), runs (R), batting average (BA), on base percentage (OBP), and slugging percentage (SLG).
    • boost_pct The percentage that the boost_stat is being boosted. In other words how much the umpire is above or below the average umpire in calling that statistic.
    • Rating Whether or not the umpire favors offensive or defensive players in that statistic. The Rating is Defensive if the umpire has a boost_pct above zero and the boost_stat is K or if the boost_pct is below zero and the stat is BB, R, BA, OBP, or SLG. The Rating is Offensive if the umpire has a boost_pct below zero and the boost_stat is K or if the boost_pct is above zero and the stat is BB, R, BA, OBP, or SLG. It will be Neither if the boost_pct is zero.

    Questions

    • Describe the distribution of k_pct based on a histogram.
    • What is the mean k_pct for all umpires?
    • What is the standard deviation of k_pct for all umpi...
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). MLB players on opening day rosters 2013-2024 [Dataset]. https://www.statista.com/statistics/639334/major-league-baseball-players-on-opering-day-rosters/
Organization logo

MLB players on opening day rosters 2013-2024

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 24, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
North America
Description

There were a total of 949 players on opening day rosters of Major League Baseball teams ahead of the 2024 season. Of these players, almost 28 percent were from countries and territories outside the United States, with the Dominican Republic being the most represented nation.

Search
Clear search
Close search
Google apps
Main menu