Facebook
TwitterThere were a total of 949 players on opening day rosters of Major League Baseball teams ahead of the 2024 season. Of these players, almost 28 percent were from countries and territories outside the United States, with the Dominican Republic being the most represented nation.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.
First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.
Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.
We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).
Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the data
df = pd.read_csv('judge.csv')
# Display the first few rows of the dataframe
print(df.head())
# Set the style of seaborn
sns.set(style="whitegrid")
# 1. Average Release Speed by Pitch Type
plt.figure(figsize=(12, 6))
avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values()
sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis")
plt.title('Average Release Speed by Pitch Type')
plt.xlabel('Average Release Speed (mph)')
plt.ylabel('Pitch Type')
plt.show()
# 2. Trends in Release Speed Over Time
# First, convert the 'game_date' to datetime
df['game_date'] = pd.to_datetime(df['game_date'])
plt.figure(figsize=(14, 7))
sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None)
plt.title('Trends in Release Speed Over Time')
plt.xlabel('Game Date')
plt.ylabel('Average Release Speed (mph)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
# 3. Scatter Plot of Release Speed vs. Events
plt.figure(figsize=(12, 6))
sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7)
plt.title('Release Speed vs. Events')
plt.xlabel('Release Speed (mph)')
plt.ylabel('Event Type')
plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!
Facebook
TwitterMajor League Baseball (MLB) is a professional sports league in North America made up of 30 teams that compete in the American League and the National League. In 2023, just over ** percent of players within the league were Hispanic or Latino.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The Lahman Baseball Database is a comprehensive, open-source compilation of statistics and player data for Major League Baseball (MLB). It contains relational data from the 19th century through the most recent complete season, including batting, pitching, and fielding statistics, player demographics, awards, team performance, and managerial records.
This dataset is widely used for exploratory data analysis, statistical modeling, predictive analysis, machine learning, and sports performance forecasting.
This dataset is the latest CSV release of the Lahman Baseball Database, downloaded directly from https://sabr.org/lahman-database/. It includes historical MLB data spanning from 1871 to 2024, organized across 27 structured tables such as: - Batting: Player-level batting stats per year - Pitching: Season-level metrics - People: Biographical data (birth/death, handedness, debut/finalGame) - Teams, Managers: Team records - BattingPost, PitchingPost, FieldingPost: Post-season stats - AllstarFull: all star game - statsHallOfFame: Historical awards and recognitions
Items to explore: - Track league-wide trends in home runs, strikeouts, or batting averages over time - Compare player performance by era, position, or righty/lefty - Create a timeline showing changes in a teams win-loss records - Map birthplace distributions of MLB players over time - Estimate the impact of rule changes on player stats (pitch clock, DH) - Model factors that influence MVP or Cy Young award wins - Predict a players future performance based on historical stats
📘 License
This dataset is released under the Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license. Attribution is required. Derivative works must be shared under the same license.
📝 Official source: https://sabr.org/lahman-database/ 📥 Direct data page: https://www.seanlahman.com/baseball-archive/statistics/ 🖊️ R-Package Documentation: https://cran.r-project.org/web/packages/Lahman/Lahman.pdf
0.1 Copyright Notice & Limited Use License This database is copyright 1996-2025 by SABR, via generious donation from Sean Lahman. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/ For licensing information or further information, contact Scott Bush at: sbush@sabr.org 0.2 Contact Information Web site: https://sabr.org/lahman-database/ E-Mail: jpomrenke@sabr.org
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains scraped Major League Baseball (MLB) batting statistics from Baseball Reference for the seasons 2015 through 2024. It was collected using a custom Python scraping script and then cleaned and processed in SQL for use in analytics and machine learning workflows.
The data provides a rich view of offensive player performance across a decade of MLB history. Each row represents a player’s season, with key batting metrics such as Batting Average (BA), On-Base Percentage (OBP), Slugging (SLG), OPS, RBI, and Games Played (G). This dataset is ideal for sports analytics, predictive modeling, and trend analysis.
Data was scraped directly from Baseball Reference using a Python script that:
Columns include: - Player – Name of the player - Year – Season year - Age – Age during the season - Team – Team code (2TM for multiple teams) - Lg – League (AL, NL, or 2LG) - G – Games played - AB, H, 2B, 3B, HR, RBI – Core batting stats - BA, OBP, SLG, OPS – Rate statistics - Pos – Primary fielding position
Raw data sourced from Baseball Reference .
Inspired by open baseball datasets and community-driven sports analytics.
Facebook
TwitterMajor League Baseball is one of the most popular professional sports leagues in North America. The survey depicts the level of interest in the MLB in the United States and it showed that 36 percent of Hispanic respondents were avid fans of the league.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset features 212 Major League Baseball (MLB) Hall of Fame position players, excluding pitchers. It includes key career statistics such as batting average (BA), home runs (HR), RBI, stolen bases (SB), WAR, OPS, and OPS+, as well as career timeline information (debut year, last year, and total active years). This allows for historical comparisons of player performance across different baseball eras.
This dataset is perfect for baseball analysts, sports data scientists, and fans looking to explore Hall of Fame career statistics over time.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Offensive statistics on MLB Players between 1947 and 2017 were used to develop a prediction model for MLB Hall of Fame selection.
Baseball-Reference.com - https://stathead.com/tiny/4tEG2
Facebook
Twitterhttps://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
| BASE YEAR | 2024 |
| HISTORICAL DATA | 2019 - 2023 |
| REGIONS COVERED | North America, Europe, APAC, South America, MEA |
| REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
| MARKET SIZE 2024 | 13.1(USD Billion) |
| MARKET SIZE 2025 | 13.5(USD Billion) |
| MARKET SIZE 2035 | 18.4(USD Billion) |
| SEGMENTS COVERED | League Type, Player Demographics, Fan Engagement, Revenue Stream, Regional |
| COUNTRIES COVERED | US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA |
| KEY MARKET DYNAMICS | increased fan engagement, rising sponsorship deals, expanding youth participation, technological advancements, global broadcasting rights |
| MARKET FORECAST UNITS | USD Billion |
| KEY COMPANIES PROFILED | Baseball Factory, Adidas, Nike, Mizuno, Marucci Sports, Under Armour, Major League Baseball, ProMounds, Axe Bat, Easton Sports, Franklin Sports, Wilson Sporting Goods, Dudley Sports, Louisville Slugger, Rawlings Sporting Goods |
| MARKET FORECAST PERIOD | 2025 - 2035 |
| KEY MARKET OPPORTUNITIES | Youth baseball programs expansion, Digital streaming partnerships, Global fan engagement initiatives, Enhanced sports merchandise sales, Advanced analytics integration. |
| COMPOUND ANNUAL GROWTH RATE (CAGR) | 3.2% (2025 - 2035) |
Facebook
TwitterIn 2024, the number of people above the age of six years old that played baseball in the United States peaked at **** million. This represented an increase over the previous year's figure of **** million.
Facebook
TwitterMajor League Baseball (MLB) is a professional sports league in North America made up of ** teams that compete in the American League and the National League. In 2023, only *** percent of MLB players were African American.
Facebook
TwitterBaseball Databank is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.
This version of the Baseball databank was downloaded from Sean Lahman's website.
Note that as of v1, this dataset is missing a few tables because of a restriction on the number of individual files that can be added. This is in the process of being fixed. The missing tables are Parks, HomeGames, CollegePlaying, Schools, Appearances, and FieldingPost.
The design follows these general principles. Each player is assigned a unique number (playerID). All of the information relating to that player is tagged with his playerID. The playerIDs are linked to names and birthdates in the MASTER table.
The database is comprised of the following main tables:
It is supplemented by these tables:
Descriptions of each of these tables can be found attached to their associated files, below.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/
Person identification and demographics data are provided by Chadwick Baseball Bureau (http://www.chadwick-bureau.com), from its Register of baseball personnel.
Player performance data for 1871 through 2014 is based on the Lahman Baseball Database, version 2015-01-24, which is Copyright (C) 1996-2015 by Sean Lahman.
The tables Parks.csv and HomeGames.csv are based on the game logs and park code table published by Retrosheet. This information is available free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.
Facebook
TwitterIn 2023, the average of players within each MLB team was between around 26-30 years old. This is considered to be the prime of a player's career, as they are typically at their peak physical and athletic ability at this age.
Who is the oldest player in the MLB?
In 2023, the average age of the players on the New York Yankees' roster was 28.3 years. Out of all the teams in MLB, the Los Angeles Dodgers had the highest average player age. In the same year, the Toronto Blue Jays' average player age was 29.6 years.
What is Major League Baseball? Major League Baseball (MLB) is the highest level of professional baseball in the United States and Canada. It comprises 30 teams, 29 of which are located in the United States and one in Canada. The teams are divided into two leagues: the American League (AL) and the National League (NL), and each league is further divided into three divisions: East, Central, and West. The teams play a 162-game regular season schedule, with the goal of earning a spot in the postseason, which consists of the AL and NL Championship Series, and the World Series. The team that wins the World Series is declared the champion of the MLB.
Fans watch at home and live in the stadiums There are many ways to enjoy MLB games, whether you are a die-hard fan, a casual viewer, or a player yourself. You can watch games on TV, or stream them live online. In 2022, the average TV viewership of MLB World Series games stood at 11.8 million. Additionally, many teams have their own websites, social media accounts, and mobile apps that allow fans to stay up-to-date with the latest news, scores, and player stats. It is also possible to purchase tickets to games and watch the action live at the stadium. In 2022, the average attendance at the games in the MLB was 26,808.
Facebook
TwitterAs one of the biggest sports leagues in the United States, with TV viewers reaching into the millions, Major League Baseball can afford to pay its players handsomely. The average salary for a player in the MLB stood at 5.16 million U.S. dollars in 2025. This marked a twofold increase on the average salary in 2005. Highs and lows of MLB salaries While the stars of every MLB team take home millions every year, there is still a minimum player salary in place to ensure that all players are compensated for their efforts. The 2025 MLB minimum player salary was set at 760,000 U.S. dollars, a significant increase on the minimum of 300,000 U.S. dollars in 2003. The league’s top earners The highest earner in the MLB in 2025 was the starting pitcher for the Los Angeles Dodgers, Shohei Ohtani. The three-time All-Star took home an annual base salary of 70 million U.S. dollars in the 2025 season. Due to their prominent role on the team, it is unsurprising that a majority of the top earners in the MLB were pitchers.
Facebook
TwitterThe Negro leagues were United States professional baseball leagues comprising teams of African Americans. The term may be used broadly to include professional black teams outside the leagues and it may be used narrowly for the seven relatively successful leagues beginning in 1920 that are sometimes termed "Negro Major Leagues".
To date, Retrosheet has compiled data on 6,116 Negro League games which were played in 337 different ballparks in 259 cities across 33 states, the District of Columbia, and two foreign countries (Mexico and Canada). We have compiled at least some statistics for 2,759 players who participated in one or more of these games. These games include not only regular-season Negro League games, but also all-star games, playoff games, and exhibition games between major-league caliber teams. This latter set includes several hundred games played between White and Black major-league baseball players (so those 2,759 players include players such as Dizzy Dean, Bob Feller, Lefty Grove, Babe Ruth, and Ted Williams, among others).
The centerpiece of Negro League data are a set of .csv files which summarize game-level data for all (5,255) Negro League games for which Retrosheet has compiled data. There are five such .csv files.
gameinfo.csv - contains game-level information such as teams, attendance, umpires, etc. teamstats.csv - contains team-level statistics - line scores, lineups, and team statistics (batting, pitching, fielding) batting.csv - batting statistics by player by game pitching.csv - pitching statistics by player by game fielding.csv - fielding statistics by player by position by game
The columns are labeled and should be mostly self-explanatory. But, in case not, the columns are defined in the document context.txt which is included in the zip file.
The level of detail at which Negro League data can be determined is highly variable across games and the data "known" is highly uncertain in many cases. For example, for many games, we have no box score but may have a reference to the fact that a particular player had at least one hit in the game. To attempt to convey this uncertainty in our data, teams and players may be given up to three sets of statistical lines for each game within the data files which are available for download. These are identified within the .csv files by the variable 'stattype'.
All teams players will have lines with stattype 'value' regardless of how little information may be known. Data for which Retrosheet has no information will be blank. In most cases where we have some information, Retrosheet has attempted to make its best estimate of player statistics and has assigned these totals to the stattype 'value'. In cases where there is some uncertainty, additional lines with stattype 'lower' or 'upper' may be added. As an example of 'upper' and 'lower' stattypes, we may know that a pitcher was knocked out of the game in the 5th inning and that the opposing team scored 4 runs in the 5th inning. In this case, the lower and upper bound for the pitcher's innings pitched would be 4 and 4.2, respectively, and the lower and upper bound for the pitcher's runs allowed would be 0 and 4 (plus whatever we know the pitcher allowed in his first four innings pitched).
In addition to these five files which aggregate all Negro League games, we also have compiled separate logs by team (subsets of teamstats.csv divided by team-season), by ballpark (subsets of gameinfo.csv) and by player (subsets of batting.csv, pitching.csv, and fielding.csv). For ballparks and players, these aggregate across all seasons.
In addition to these .csvs, Retrosheet has also compiled event files (.evx files) and box-score files (.ebx files) for games for which sufficient data is available. Games are compiled into a single file for each season for which we have compiled games of the relevant type. In the former case, event files are included both for games for which we have found play-by-play accounts as well as games which have been deduced. The latter are identified within the files via a comment at the start of the play-by-play portion of the file.
Finally, the zip file here includes roster files for all teams for whom Retrosheet has compiled rosters as well as our master files for people (biofile.csv), ballparks (ballparks.csv), and teams (teams.csv). These files include data for all people, teams, and sites across all Retrosheet games, not just Negro League games.
Read more about the dataset here.
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties ma...
Facebook
TwitterFinancial overview and grant giving statistics of Negro League Baseball Players Association Inc
Facebook
TwitterBy Andy Kriebel [source]
About this dataset
This dataset contains MLB hitting statistics for the 2013 season. The original source of the data is Lahman’s Baseball Database. The original visualization can be found here.
This dataset is interesting because it allows us to see which players were the most cost effective in terms of salary and production. For example, we can see that Miguel Cabrera was the highest paid player in 2013, but he was also one of the most productive hitters in terms of runs batted in (RBIs). On the other hand, we can see that players like Mike Trout and Clayton Kershaw were among the league leaders in production but they were not among the highest paid players.
There are a number of ways to measure a player's cost effectiveness, but one simple method is to compare their salary to their production (measured by runs created, or RC). Players who create a lot of runs while being paid relatively little are more cost effective than players who are paid more but produce less. By this metric, some of the most cost effective players in 2013 were Delmon Young, Wilson Ramos, and Shane Victorino
For more datasets, click here.
- Your notebook can be here!
https://www.kaggle.com/andrewmvd/most-cost-effective-players-of-2019
How to Use This Dataset
This dataset consists of Major League Baseball's most cost effective players of 2019, as measured by WAR per dollar of salary (wWAR/$). WAR is a metric that attempts to measure a player's overall contributions to their team, and includes both offense and defense. You can read more about it here. The dataset includes each player's name, position, team, salary, and wWAR/$.
To use this dataset, you may want to consider the following questions: * Who are the most cost effective players in baseball? * What positions do these players tend to play? * Which teams have the most cost effective players?
- finding the most cost-effective baseball players
- comparing different salary structures among teams
- improving player performance through analytics
If you use this dataset in your research, please credit the original authors.
License
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: MLB Stats.csv | Column name | Description | |:----------------|:---------------------------------------------------------------| | Player Name | The player's name. (String) | | weight | The player's weight in pounds. (Numeric) | | height | The player's height in inches. (Numeric) | | bats | The player's batting handedness. (String) | | throws | The player's throwing handedness. (String) | | Season | The season in which the statistics were accrued. (String) | | League | The league in which the player played. (String) | | Team | The team for which the player played. (String) | | Franchise | The franchise to which the team belongs. (String) | | G | The number of games the player played. (Numeric) | | AB | The number of at-bats the player had. (Numeric) | | R | The number of runs the player scored. (Numeric) | | H | The number of hits the player had. (Numeric) | | 2B | The number of doubles the player hit. (Numeric) ...
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/28782https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.7910/DVN/28782
This paper examines salaries given to arbitration eligible players in Major League Baseball from 2008-2013 and compares them to free agent contracts from the same period. Anecdotal evidence suggests that simpler statistics are more successful in Major League Baseball's final offer arbitration setting as legal experts tasked with handling the league's cases may not have a deep knowledge of player valuation. I examine the effects of wins above replacement, a complex but comprehensive metric, and traditional statistics, such as runs batted in, on salaries decided in both settings. Wins above replacement is significant in each case, but with a much higher coefficient in free agency suggesting a greater impact. There is no evidence of individual traditional statistics being especially significant in arbitration; I attribute this to parties framing their offers with whichever statistics portray them in the most favorable light. Finally, I look to statistics in the season following contracts to determine if either market is more effective in getting value at a low cost, but results are similar in each case and limitations with the data restrict the efficacy of conclusions in the section.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Basic statistics for the MLB and NBA Twitter networks using mathematica.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic data, description of visual disability and general health conditions of visually impaired subjects playing baseball and visually impaired sedentary individuals.
Facebook
TwitterThere were a total of 949 players on opening day rosters of Major League Baseball teams ahead of the 2024 season. Of these players, almost 28 percent were from countries and territories outside the United States, with the Dominican Republic being the most represented nation.