100+ datasets found
  1. MLB players on opening day rosters 2013-2024

    • statista.com
    Updated Jun 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). MLB players on opening day rosters 2013-2024 [Dataset]. https://www.statista.com/statistics/639334/major-league-baseball-players-on-opering-day-rosters/
    Explore at:
    Dataset updated
    Jun 24, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    North America
    Description

    There were a total of 949 players on opening day rosters of Major League Baseball teams ahead of the 2024 season. Of these players, almost 28 percent were from countries and territories outside the United States, with the Dominican Republic being the most represented nation.

  2. Compare Baseball Player Statistics using Visualiza

    • kaggle.com
    zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
    Explore at:
    zip(1030978 bytes)Available download formats
    Dataset updated
    Sep 28, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

    1. Load the Data

    First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

    2. Explore the Data

    Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

    3. Visualization

    We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

    Example Code

    Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Load the data
    df = pd.read_csv('judge.csv')
    
    # Display the first few rows of the dataframe
    print(df.head())
    
    # Set the style of seaborn
    sns.set(style="whitegrid")
    
    # 1. Average Release Speed by Pitch Type
    plt.figure(figsize=(12, 6))
    avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values()
    sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis")
    plt.title('Average Release Speed by Pitch Type')
    plt.xlabel('Average Release Speed (mph)')
    plt.ylabel('Pitch Type')
    plt.show()
    
    # 2. Trends in Release Speed Over Time
    # First, convert the 'game_date' to datetime
    df['game_date'] = pd.to_datetime(df['game_date'])
    
    plt.figure(figsize=(14, 7))
    sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None)
    plt.title('Trends in Release Speed Over Time')
    plt.xlabel('Game Date')
    plt.ylabel('Average Release Speed (mph)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    # 3. Scatter Plot of Release Speed vs. Events
    plt.figure(figsize=(12, 6))
    sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7)
    plt.title('Release Speed vs. Events')
    plt.xlabel('Release Speed (mph)')
    plt.ylabel('Event Type')
    plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.show()
    

    Explanation of the Code

    • Data Loading: The CSV file is loaded into a Pandas DataFrame.
    • Average Release Speed: A bar chart shows the average release speed for each pitch type.
    • Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.
    • Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

    Conclusion

    These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!

  3. Players in the MLB in 2023, by ethnicity

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Players in the MLB in 2023, by ethnicity [Dataset]. https://www.statista.com/statistics/1310428/racial-diversity-mlb-players/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    North America
    Description

    Major League Baseball (MLB) is a professional sports league in North America made up of 30 teams that compete in the American League and the National League. In 2023, just over ** percent of players within the league were Hispanic or Latino.

  4. Lahman Baseball Database

    • kaggle.com
    zip
    Updated Jul 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dalya S (2025). Lahman Baseball Database [Dataset]. https://www.kaggle.com/datasets/dalyas/lahman-baseball-database
    Explore at:
    zip(9971692 bytes)Available download formats
    Dataset updated
    Jul 20, 2025
    Authors
    Dalya S
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The Lahman Baseball Database is a comprehensive, open-source compilation of statistics and player data for Major League Baseball (MLB). It contains relational data from the 19th century through the most recent complete season, including batting, pitching, and fielding statistics, player demographics, awards, team performance, and managerial records.

    This dataset is widely used for exploratory data analysis, statistical modeling, predictive analysis, machine learning, and sports performance forecasting.

    This dataset is the latest CSV release of the Lahman Baseball Database, downloaded directly from https://sabr.org/lahman-database/. It includes historical MLB data spanning from 1871 to 2024, organized across 27 structured tables such as: - Batting: Player-level batting stats per year - Pitching: Season-level metrics - People: Biographical data (birth/death, handedness, debut/finalGame) - Teams, Managers: Team records - BattingPost, PitchingPost, FieldingPost: Post-season stats - AllstarFull: all star game - statsHallOfFame: Historical awards and recognitions

    Items to explore: - Track league-wide trends in home runs, strikeouts, or batting averages over time - Compare player performance by era, position, or righty/lefty - Create a timeline showing changes in a teams win-loss records - Map birthplace distributions of MLB players over time - Estimate the impact of rule changes on player stats (pitch clock, DH) - Model factors that influence MVP or Cy Young award wins - Predict a players future performance based on historical stats

    📘 License

    This dataset is released under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. Attribution is required. Derivative works must be shared under the same license.

    📝 Official source: https://sabr.org/lahman-database/ 📥 Direct data page: https://www.seanlahman.com/baseball-archive/statistics/ 🖊️ R-Package Documentation: https://cran.r-project.org/web/packages/Lahman/Lahman.pdf

    0.1 Copyright Notice & Limited Use License This database is copyright 1996-2025 by SABR, via generious donation from Sean Lahman. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/ For licensing information or further information, contact Scott Bush at: sbush@sabr.org 0.2 Contact Information Web site: https://sabr.org/lahman-database/ E-Mail: jpomrenke@sabr.org

  5. MLB Batting Data (2015-2024)

    • kaggle.com
    zip
    Updated Sep 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josue FernandezC (2025). MLB Batting Data (2015-2024) [Dataset]. https://www.kaggle.com/datasets/josuefernandezc/mlb-hitting-data-2015-2024
    Explore at:
    zip(272240 bytes)Available download formats
    Dataset updated
    Sep 29, 2025
    Authors
    Josue FernandezC
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MLB Batting Stats (2015–2024)

    📝Description

    This dataset contains scraped Major League Baseball (MLB) batting statistics from Baseball Reference for the seasons 2015 through 2024. It was collected using a custom Python scraping script and then cleaned and processed in SQL for use in analytics and machine learning workflows.

    The data provides a rich view of offensive player performance across a decade of MLB history. Each row represents a player’s season, with key batting metrics such as Batting Average (BA), On-Base Percentage (OBP), Slugging (SLG), OPS, RBI, and Games Played (G). This dataset is ideal for sports analytics, predictive modeling, and trend analysis.

    ⚙️Data Collection (Python)

    Data was scraped directly from Baseball Reference using a Python script that:

    • Sent HTTP requests with browser-like headers to avoid request blocking.
    • Parsed HTML tables with pandas.read_html().
    • Added a Year column for each season.
    • Cleaned player names by removing symbols (#, *).
    • Kept summary rows for players who appeared on multiple teams/leagues.
    • Converted numeric fields and filled missing values with zeros.
    • Exported both raw and cleaned CSVs for each year.

    🧹Data Cleaning (SQL)

    • After scraping, the raw batting tables were uploaded into BigQuery and further cleaned:
    • Null values removed – Rows missing key fields (Player, BA, OBP, SLG, OPS, Pos) were excluded.
    • Duplicate records handled – Identified duplicate player–year–league entries and kept only one instance.
    • Minimum playing threshold applied – Players with fewer than 100 at-bats were removed to focus on meaningful season-long contributions.
    • The final cleaned table (cleaned_batting_stats) provides consistent, duplicate-free player summaries suitable for analytics.

    📊Dataset Structure

    Columns include: - Player – Name of the player - Year – Season year - Age – Age during the season - Team – Team code (2TM for multiple teams) - Lg – League (AL, NL, or 2LG) - G – Games played - AB, H, 2B, 3B, HR, RBI – Core batting stats - BA, OBP, SLG, OPS – Rate statistics - Pos – Primary fielding position

    🚀Potential Uses

    • League Trends: Compare batting averages and OPS across seasons.
    • Top Performer Analysis: Identify the best hitters in different eras.
    • Predictive Modeling: Forecast future player stats using regression or ML.
    • Clustering: Group players into offensive archetypes.# ## ## ##
    • Sports Dashboards: Build interactive Tableau/Plotly dashboards for fans and analysts.

    📌Acknowledgments

    Raw data sourced from Baseball Reference .

    Inspired by open baseball datasets and community-driven sports analytics.

  6. MLB interest level in the U.S. 2023, by ethnicity

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, MLB interest level in the U.S. 2023, by ethnicity [Dataset]. https://www.statista.com/statistics/1100127/interest-level-baseball-ethnicity/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 17, 2023 - Mar 19, 2023
    Area covered
    United States
    Description

    Major League Baseball is one of the most popular professional sports leagues in North America. The survey depicts the level of interest in the MLB in the United States and it showed that 36 percent of Hispanic respondents were avid fans of the league.

  7. 🏆 Baseball Hall of Fame Position Players

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kris Bruurs (2025). 🏆 Baseball Hall of Fame Position Players [Dataset]. https://www.kaggle.com/datasets/krisbruurs/baseball-hall-of-fame-position-players
    Explore at:
    zip(40297 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Kris Bruurs
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description:

    This dataset features 212 Major League Baseball (MLB) Hall of Fame position players, excluding pitchers. It includes key career statistics such as batting average (BA), home runs (HR), RBI, stolen bases (SB), WAR, OPS, and OPS+, as well as career timeline information (debut year, last year, and total active years). This allows for historical comparisons of player performance across different baseball eras.

    Columns:

    • Name – Player's full name
    • Debut_Year – The year the player made their MLB debut
    • Last_Year – The final year the player played in MLB
    • Active_Years – Total number of years the player was active (Last_Year - Debut_Year + 1)
    • Positions – Primary positions played
    • Bats/Throws – Batting and throwing hand
    • Height/Weight – Physical attributes
    • WAR (Wins Above Replacement) – Overall player value
    • AB (At-Bats), H (Hits), HR (Home Runs), RBI (Runs Batted In), R (Runs) – Traditional batting stats
    • BA (Batting Average), OBP (On-Base Percentage), SLG (Slugging Percentage), OPS (On-base + Slugging), OPS+ – Performance metrics

    Use Cases:

    • Era Comparisons – Analyze how Hall of Fame players’ stats have changed over time.
    • Career Longevity Analysis – Study how long Hall of Famers played and whether longer careers correlated with higher performance.
    • Position-Based Trends – Compare performance across different positions (e.g., outfielders vs. shortstops).
    • WAR & OPS+ Analysis – Examine how advanced metrics evolved over different baseball generations.

    This dataset is perfect for baseball analysts, sports data scientists, and fans looking to explore Hall of Fame career statistics over time.

  8. Raw MLB Player Data

    • kaggle.com
    zip
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Coxen (2024). Raw MLB Player Data [Dataset]. https://www.kaggle.com/datasets/chriscoxen/raw-mlb-player-data
    Explore at:
    zip(2097546 bytes)Available download formats
    Dataset updated
    May 14, 2024
    Authors
    Chris Coxen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Offensive statistics on MLB Players between 1947 and 2017 were used to develop a prediction model for MLB Hall of Fame selection.

    Baseball-Reference.com - https://stathead.com/tiny/4tEG2

  9. w

    Global Baseball League Market Research Report: By League Type (Professional,...

    • wiseguyreports.com
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Baseball League Market Research Report: By League Type (Professional, Amateur, Semipro), By Player Demographics (Age, Gender, Skill Level), By Fan Engagement (In-Person Attendance, Broadcast Viewership, Online Streaming), By Revenue Stream (Ticket Sales, Merchandising, Sponsorship) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/baseball-league-market
    Explore at:
    Dataset updated
    Aug 19, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202413.1(USD Billion)
    MARKET SIZE 202513.5(USD Billion)
    MARKET SIZE 203518.4(USD Billion)
    SEGMENTS COVEREDLeague Type, Player Demographics, Fan Engagement, Revenue Stream, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSincreased fan engagement, rising sponsorship deals, expanding youth participation, technological advancements, global broadcasting rights
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDBaseball Factory, Adidas, Nike, Mizuno, Marucci Sports, Under Armour, Major League Baseball, ProMounds, Axe Bat, Easton Sports, Franklin Sports, Wilson Sporting Goods, Dudley Sports, Louisville Slugger, Rawlings Sporting Goods
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESYouth baseball programs expansion, Digital streaming partnerships, Global fan engagement initiatives, Enhanced sports merchandise sales, Advanced analytics integration.
    COMPOUND ANNUAL GROWTH RATE (CAGR) 3.2% (2025 - 2035)
  10. s

    Baseball participation in the U.S. 2010-2024

    • statista.com
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Baseball participation in the U.S. 2010-2024 [Dataset]. https://www.statista.com/statistics/191626/participants-in-baseball-in-the-us-since-2006/
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset authored and provided by
    Statista
    Area covered
    United States
    Description

    In 2024, the number of people above the age of six years old that played baseball in the United States peaked at **** million. This represented an increase over the previous year's figure of **** million.

  11. African American representation in the MLB 2005-2023

    • statista.com
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). African American representation in the MLB 2005-2023 [Dataset]. https://www.statista.com/statistics/1168026/mlb-african-american-players/
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    North America
    Description

    Major League Baseball (MLB) is a professional sports league in North America made up of ** teams that compete in the American League and the National League. In 2023, only *** percent of MLB players were African American.

  12. Baseball Databank

    • kaggle.com
    zip
    Updated Nov 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Source Sports (2019). Baseball Databank [Dataset]. https://www.kaggle.com/open-source-sports/baseball-databank
    Explore at:
    zip(7126856 bytes)Available download formats
    Dataset updated
    Nov 17, 2019
    Dataset authored and provided by
    Open Source Sports
    Description

    Baseball Databank is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.

    This version of the Baseball databank was downloaded from Sean Lahman's website.

    Note that as of v1, this dataset is missing a few tables because of a restriction on the number of individual files that can be added. This is in the process of being fixed. The missing tables are Parks, HomeGames, CollegePlaying, Schools, Appearances, and FieldingPost.

    The Data

    The design follows these general principles. Each player is assigned a unique number (playerID). All of the information relating to that player is tagged with his playerID. The playerIDs are linked to names and birthdates in the MASTER table.

    The database is comprised of the following main tables:

    • MASTER - Player names, DOB, and biographical info
    • Batting - batting statistics
    • Pitching - pitching statistics
    • Fielding - fielding statistics

    It is supplemented by these tables:

    • AllStarFull - All-Star appearances
    • HallofFame - Hall of Fame voting data
    • Managers - managerial statistics
    • Teams - yearly stats and standings
    • BattingPost - post-season batting statistics
    • PitchingPost - post-season pitching statistics
    • TeamFranchises - franchise information
    • FieldingOF - outfield position data
    • FieldingPost- post-season fielding data
    • ManagersHalf - split season data for managers
    • TeamsHalf - split season data for teams
    • Salaries - player salary data
    • SeriesPost - post-season series information
    • AwardsManagers - awards won by managers
    • AwardsPlayers - awards won by players
    • AwardsShareManagers - award voting for manager awards
    • AwardsSharePlayers - award voting for player awards
    • Appearances - details on the positions a player appeared at
    • Schools - list of colleges that players attended
    • CollegePlaying - list of players and the colleges they attended

    Descriptions of each of these tables can be found attached to their associated files, below.

    Acknowledgments

    This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/

    Person identification and demographics data are provided by Chadwick Baseball Bureau (http://www.chadwick-bureau.com), from its Register of baseball personnel.

    Player performance data for 1871 through 2014 is based on the Lahman Baseball Database, version 2015-01-24, which is Copyright (C) 1996-2015 by Sean Lahman.

    The tables Parks.csv and HomeGames.csv are based on the game logs and park code table published by Retrosheet. This information is available free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.

  13. Average age of players in Major League Baseball by club 2023

    • statista.com
    Updated Sep 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Average age of players in Major League Baseball by club 2023 [Dataset]. https://www.statista.com/statistics/236223/major-league-baseball-clubs-by-average-age-of-players/
    Explore at:
    Dataset updated
    Sep 12, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, the average of players within each MLB team was between around 26-30 years old. This is considered to be the prime of a player's career, as they are typically at their peak physical and athletic ability at this age.

    Who is the oldest player in the MLB?

    In 2023, the average age of the players on the New York Yankees' roster was 28.3 years. Out of all the teams in MLB, the Los Angeles Dodgers had the highest average player age. In the same year, the Toronto Blue Jays' average player age was 29.6 years.

    What is Major League Baseball? Major League Baseball (MLB) is the highest level of professional baseball in the United States and Canada. It comprises 30 teams, 29 of which are located in the United States and one in Canada. The teams are divided into two leagues: the American League (AL) and the National League (NL), and each league is further divided into three divisions: East, Central, and West. The teams play a 162-game regular season schedule, with the goal of earning a spot in the postseason, which consists of the AL and NL Championship Series, and the World Series. The team that wins the World Series is declared the champion of the MLB.

    Fans watch at home and live in the stadiums There are many ways to enjoy MLB games, whether you are a die-hard fan, a casual viewer, or a player yourself. You can watch games on TV, or stream them live online. In 2022, the average TV viewership of MLB World Series games stood at 11.8 million. Additionally, many teams have their own websites, social media accounts, and mobile apps that allow fans to stay up-to-date with the latest news, scores, and player stats. It is also possible to purchase tickets to games and watch the action live at the stadium. In 2022, the average attendance at the games in the MLB was 26,808.

  14. Major League Baseball average player salary 2003-2025

    • statista.com
    Updated Apr 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Major League Baseball average player salary 2003-2025 [Dataset]. https://www.statista.com/statistics/236213/mean-salaray-of-players-in-majpr-league-baseball/
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    As one of the biggest sports leagues in the United States, with TV viewers reaching into the millions, Major League Baseball can afford to pay its players handsomely. The average salary for a player in the MLB stood at 5.16 million U.S. dollars in 2025. This marked a twofold increase on the average salary in 2005. Highs and lows of MLB salaries While the stars of every MLB team take home millions every year, there is still a minimum player salary in place to ensure that all players are compensated for their efforts. The 2025 MLB minimum player salary was set at 760,000 U.S. dollars, a significant increase on the minimum of 300,000 U.S. dollars in 2003. The league’s top earners The highest earner in the MLB in 2025 was the starting pitcher for the Los Angeles Dodgers, Shohei Ohtani. The three-time All-Star took home an annual base salary of 70 million U.S. dollars in the 2025 season. Due to their prominent role on the team, it is unsurprising that a majority of the top earners in the MLB were pitchers.  

  15. 🏟️ Negro League Database

    • kaggle.com
    zip
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2024). 🏟️ Negro League Database [Dataset]. https://www.kaggle.com/datasets/mexwell/negro-league-database
    Explore at:
    zip(16198067 bytes)Available download formats
    Dataset updated
    Oct 8, 2024
    Authors
    mexwell
    Description

    About

    The Negro leagues were United States professional baseball leagues comprising teams of African Americans. The term may be used broadly to include professional black teams outside the leagues and it may be used narrowly for the seven relatively successful leagues beginning in 1920 that are sometimes termed "Negro Major Leagues".

    To date, Retrosheet has compiled data on 6,116 Negro League games which were played in 337 different ballparks in 259 cities across 33 states, the District of Columbia, and two foreign countries (Mexico and Canada). We have compiled at least some statistics for 2,759 players who participated in one or more of these games. These games include not only regular-season Negro League games, but also all-star games, playoff games, and exhibition games between major-league caliber teams. This latter set includes several hundred games played between White and Black major-league baseball players (so those 2,759 players include players such as Dizzy Dean, Bob Feller, Lefty Grove, Babe Ruth, and Ted Williams, among others).

    The centerpiece of Negro League data are a set of .csv files which summarize game-level data for all (5,255) Negro League games for which Retrosheet has compiled data. There are five such .csv files.

    gameinfo.csv - contains game-level information such as teams, attendance, umpires, etc. teamstats.csv - contains team-level statistics - line scores, lineups, and team statistics (batting, pitching, fielding) batting.csv - batting statistics by player by game pitching.csv - pitching statistics by player by game fielding.csv - fielding statistics by player by position by game

    The columns are labeled and should be mostly self-explanatory. But, in case not, the columns are defined in the document context.txt which is included in the zip file.

    The level of detail at which Negro League data can be determined is highly variable across games and the data "known" is highly uncertain in many cases. For example, for many games, we have no box score but may have a reference to the fact that a particular player had at least one hit in the game. To attempt to convey this uncertainty in our data, teams and players may be given up to three sets of statistical lines for each game within the data files which are available for download. These are identified within the .csv files by the variable 'stattype'.

    • stattype 'value' is Retrosheet's best estimate of the relevant statistical total
    • stattype 'lower' is the lower bound on a player's total
    • stattype 'upper' is the upper bound on a player's total

    All teams players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added. As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).

    In addition to these five files which aggregate all Negro League games, we also have compiled separate logs by team (subsets of teamstats.csv divided by team-season), by ballpark (subsets of gameinfo.csv) and by player (subsets of batting.csv, pitching.csv, and fielding.csv). For ballparks and players, these aggregate across all seasons.

    In addition to these .csvs, Retrosheet has also compiled event files (.evx files) and box-score files (.ebx files) for games for which sufficient data is available. Games are compiled into a single file for each season for which we have compiled games of the relevant type. In the former case, event files are included both for games for which we have found play-by-play accounts as well as games which have been deduced. The latter are identified within the files via a comment at the start of the play-by-play portion of the file.

    Finally, the zip file here includes roster files for all teams for whom Retrosheet has compiled rosters as well as our master files for people (biofile.csv), ballparks (ballparks.csv), and teams (teams.csv). These files include data for all people, teams, and sites across all Retrosheet games, not just Negro League games.

    Read more about the dataset here.

    Notice

    The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties ma...

  16. i

    Grant Giving Statistics for Negro League Baseball Players Association Inc

    • instrumentl.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Grant Giving Statistics for Negro League Baseball Players Association Inc [Dataset]. https://www.instrumentl.com/990-report/negro-league-baseball-players-association-inc
    Explore at:
    Dataset updated
    Apr 20, 2025
    Description

    Financial overview and grant giving statistics of Negro League Baseball Players Association Inc

  17. Major League Baseball's Most Cost-Effective

    • kaggle.com
    zip
    Updated Nov 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Major League Baseball's Most Cost-Effective [Dataset]. https://www.kaggle.com/datasets/thedevastator/major-league-baseball-s-most-cost-effective-play/suggestions
    Explore at:
    zip(757938 bytes)Available download formats
    Dataset updated
    Nov 25, 2022
    Authors
    The Devastator
    Description

    Major League Baseball's Most Cost-Effective Players of 2019

    Hitting, Pitching, and Overall Statistics

    By Andy Kriebel [source]

    About this dataset

    About this dataset

    This dataset contains MLB hitting statistics for the 2013 season. The original source of the data is Lahman’s Baseball Database. The original visualization can be found here.

    This dataset is interesting because it allows us to see which players were the most cost effective in terms of salary and production. For example, we can see that Miguel Cabrera was the highest paid player in 2013, but he was also one of the most productive hitters in terms of runs batted in (RBIs). On the other hand, we can see that players like Mike Trout and Clayton Kershaw were among the league leaders in production but they were not among the highest paid players.

    There are a number of ways to measure a player's cost effectiveness, but one simple method is to compare their salary to their production (measured by runs created, or RC). Players who create a lot of runs while being paid relatively little are more cost effective than players who are paid more but produce less. By this metric, some of the most cost effective players in 2013 were Delmon Young, Wilson Ramos, and Shane Victorino

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • Your notebook can be here!

    How to use the dataset

    https://www.kaggle.com/andrewmvd/most-cost-effective-players-of-2019

    How to Use This Dataset

    This dataset consists of Major League Baseball's most cost effective players of 2019, as measured by WAR per dollar of salary (wWAR/$). WAR is a metric that attempts to measure a player's overall contributions to their team, and includes both offense and defense. You can read more about it here. The dataset includes each player's name, position, team, salary, and wWAR/$.

    To use this dataset, you may want to consider the following questions: * Who are the most cost effective players in baseball? * What positions do these players tend to play? * Which teams have the most cost effective players?

    Research Ideas

    • finding the most cost-effective baseball players
    • comparing different salary structures among teams
    • improving player performance through analytics

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: MLB Stats.csv | Column name | Description | |:----------------|:---------------------------------------------------------------| | Player Name | The player's name. (String) | | weight | The player's weight in pounds. (Numeric) | | height | The player's height in inches. (Numeric) | | bats | The player's batting handedness. (String) | | throws | The player's throwing handedness. (String) | | Season | The season in which the statistics were accrued. (String) | | League | The league in which the player played. (String) | | Team | The team for which the player played. (String) | | Franchise | The franchise to which the team belongs. (String) | | G | The number of games the player played. (Numeric) | | AB | The number of at-bats the player had. (Numeric) | | R | The number of runs the player scored. (Numeric) | | H | The number of hits the player had. (Numeric) | | 2B | The number of doubles the player hit. (Numeric) ...

  18. H

    Data from: Simplicity Versus WAR: Examining Salary Determinations in Major...

    • dataverse.harvard.edu
    Updated Sep 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joshua Studnitzer (2019). Simplicity Versus WAR: Examining Salary Determinations in Major League Baseball's Arbitration and Free Agent Markets [Dataset]. http://doi.org/10.7910/DVN/28782
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Joshua Studnitzer
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/28782https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/28782

    Description

    This paper examines salaries given to arbitration eligible players in Major League Baseball from 2008-2013 and compares them to free agent contracts from the same period. Anecdotal evidence suggests that simpler statistics are more successful in Major League Baseball's final offer arbitration setting as legal experts tasked with handling the league's cases may not have a deep knowledge of player valuation. I examine the effects of wins above replacement, a complex but comprehensive metric, and traditional statistics, such as runs batted in, on salaries decided in both settings. Wins above replacement is significant in each case, but with a much higher coefficient in free agency suggesting a greater impact. There is no evidence of individual traditional statistics being especially significant in arbitration; I attribute this to parties framing their offers with whichever statistics portray them in the most favorable light. Finally, I look to statistics in the season following contracts to determine if either market is more effective in getting value at a low cost, but results are similar in each case and limitations with the data restrict the efficacy of conclusions in the section.

  19. Basic statistics for the MLB and NBA Twitter networks using mathematica.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily J. Evans; Rebecca Jones; Joseph Leung; Benjamin Z. Webb (2023). Basic statistics for the MLB and NBA Twitter networks using mathematica. [Dataset]. http://doi.org/10.1371/journal.pone.0268619.t013
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Emily J. Evans; Rebecca Jones; Joseph Leung; Benjamin Z. Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Basic statistics for the MLB and NBA Twitter networks using mathematica.

  20. Demographic data, description of visual disability and general health...

    • plos.figshare.com
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniela Mirandola; Marco Monaci; Guido Miccinesi; Alessia Vannuzzi; Eleonora Sgambati; Mirko Manetti; Mirca Marini (2023). Demographic data, description of visual disability and general health conditions of visually impaired subjects playing baseball and visually impaired sedentary individuals. [Dataset]. http://doi.org/10.1371/journal.pone.0218124.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Daniela Mirandola; Marco Monaci; Guido Miccinesi; Alessia Vannuzzi; Eleonora Sgambati; Mirko Manetti; Mirca Marini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographic data, description of visual disability and general health conditions of visually impaired subjects playing baseball and visually impaired sedentary individuals.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). MLB players on opening day rosters 2013-2024 [Dataset]. https://www.statista.com/statistics/639334/major-league-baseball-players-on-opering-day-rosters/
Organization logo

MLB players on opening day rosters 2013-2024

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 24, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
North America
Description

There were a total of 949 players on opening day rosters of Major League Baseball teams ahead of the 2024 season. Of these players, almost 28 percent were from countries and territories outside the United States, with the Dominican Republic being the most represented nation.

Search
Clear search
Close search
Google apps
Main menu