4 datasets found
  1. Baseball Databank

    • kaggle.com
    Updated Nov 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Source Sports (2019). Baseball Databank [Dataset]. https://www.kaggle.com/datasets/open-source-sports/baseball-databank/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Open Source Sports
    Description

    Baseball Databank is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.

    This version of the Baseball databank was downloaded from Sean Lahman's website.

    Note that as of v1, this dataset is missing a few tables because of a restriction on the number of individual files that can be added. This is in the process of being fixed. The missing tables are Parks, HomeGames, CollegePlaying, Schools, Appearances, and FieldingPost.

    The Data

    The design follows these general principles. Each player is assigned a unique number (playerID). All of the information relating to that player is tagged with his playerID. The playerIDs are linked to names and birthdates in the MASTER table.

    The database is comprised of the following main tables:

    • MASTER - Player names, DOB, and biographical info
    • Batting - batting statistics
    • Pitching - pitching statistics
    • Fielding - fielding statistics

    It is supplemented by these tables:

    • AllStarFull - All-Star appearances
    • HallofFame - Hall of Fame voting data
    • Managers - managerial statistics
    • Teams - yearly stats and standings
    • BattingPost - post-season batting statistics
    • PitchingPost - post-season pitching statistics
    • TeamFranchises - franchise information
    • FieldingOF - outfield position data
    • FieldingPost- post-season fielding data
    • ManagersHalf - split season data for managers
    • TeamsHalf - split season data for teams
    • Salaries - player salary data
    • SeriesPost - post-season series information
    • AwardsManagers - awards won by managers
    • AwardsPlayers - awards won by players
    • AwardsShareManagers - award voting for manager awards
    • AwardsSharePlayers - award voting for player awards
    • Appearances - details on the positions a player appeared at
    • Schools - list of colleges that players attended
    • CollegePlaying - list of players and the colleges they attended

    Descriptions of each of these tables can be found attached to their associated files, below.

    Acknowledgments

    This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/

    Person identification and demographics data are provided by Chadwick Baseball Bureau (http://www.chadwick-bureau.com), from its Register of baseball personnel.

    Player performance data for 1871 through 2014 is based on the Lahman Baseball Database, version 2015-01-24, which is Copyright (C) 1996-2015 by Sean Lahman.

    The tables Parks.csv and HomeGames.csv are based on the game logs and park code table published by Retrosheet. This information is available free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.

  2. Lahman Baseball Batting Data

    • kaggle.com
    zip
    Updated Aug 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minnie Liang (2020). Lahman Baseball Batting Data [Dataset]. https://www.kaggle.com/minnieliang/lahman-batting-data
    Explore at:
    zip(1948337 bytes)Available download formats
    Dataset updated
    Aug 12, 2020
    Authors
    Minnie Liang
    Description

    Content

    A baseball batting data frame with 107429 observations on the following 22 variables.

    playerID Player ID code

    yearID Year

    stint player's stint (order of appearances within a season)

    teamID Team; a factor

    lgID League; a factor with levels AA AL FL NL PL UA

    G Games: number of games in which a player played

    AB At Bats

    R Runs

    H Hits: times reached base because of a batted, fair ball without error by the defense

    X2B Doubles: hits on which the batter reached second base safely

    X3B Triples: hits on which the batter reached third base safely

    HR Homeruns

    RBI Runs Batted In

    SB Stolen Bases

    CS Caught Stealing

    BB Base on Balls

    SO Strikeouts

    IBB Intentional walks

    HBP Hit by pitch

    SH Sacrifice hits

    SF Sacrifice flies

    GIDP Grounded into double play

  3. Batting

    • figshare.com
    txt
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chadwick Baseball Bureau; Lahman Baseball Database (2023). Batting [Dataset]. http://doi.org/10.6084/m9.figshare.15502593.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Chadwick Baseball Bureau; Lahman Baseball Database
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Baseball Databank is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.

  4. The History of Baseball

    • kaggle.com
    zip
    Updated Nov 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SeanLahman (2019). The History of Baseball [Dataset]. https://www.kaggle.com/seanlahman/the-history-of-baseball
    Explore at:
    zip(21463012 bytes)Available download formats
    Dataset updated
    Nov 14, 2019
    Authors
    SeanLahman
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Baffled why your team traded for that 34-year-old pitcher? Convinced you can create a new and improved version of WAR? Wondering what made the 1907 Cubs great and if can they do it again?

    The History of Baseball is a reformatted version of the famous Lahman’s Baseball Database. It contains Major League Baseball’s complete batting and pitching statistics from 1871 to 2015, plus fielding statistics, standings, team stats, park stats, player demographics, managerial records, awards, post-season data, and more.

    Scripts, Kaggle’s free, in-browser analytics tool, makes it easy to share detailed sabermetrics, predict the next hall of fame inductee, illustrate how speed scores runs, or publish a definitive analysis on why the Los Angeles Dodgers will never win another World Series.

    We have more ideas for analysis than games in a season, but here are a few we’d really love to see:

    • Is there a most error-prone position?
    • When do players at different positions peak?
    • Are the best performers selected for all-star game?
    • How many walks does it take for a starting pitcher to get pulled?
    • Do players with a high ground into double play (GIDP) have a lower batting average?
    • Which players are the most likely to choke during the post-season?
    • Why should or shouldn’t the National League adopt the designated hitter rule?

    See the full SQLite schema.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Open Source Sports (2019). Baseball Databank [Dataset]. https://www.kaggle.com/datasets/open-source-sports/baseball-databank/discussion
Organization logo

Baseball Databank

Data on baseball players, teams, and games from 1871 to 2015

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Open Source Sports
Description

Baseball Databank is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.

This version of the Baseball databank was downloaded from Sean Lahman's website.

Note that as of v1, this dataset is missing a few tables because of a restriction on the number of individual files that can be added. This is in the process of being fixed. The missing tables are Parks, HomeGames, CollegePlaying, Schools, Appearances, and FieldingPost.

The Data

The design follows these general principles. Each player is assigned a unique number (playerID). All of the information relating to that player is tagged with his playerID. The playerIDs are linked to names and birthdates in the MASTER table.

The database is comprised of the following main tables:

  • MASTER - Player names, DOB, and biographical info
  • Batting - batting statistics
  • Pitching - pitching statistics
  • Fielding - fielding statistics

It is supplemented by these tables:

  • AllStarFull - All-Star appearances
  • HallofFame - Hall of Fame voting data
  • Managers - managerial statistics
  • Teams - yearly stats and standings
  • BattingPost - post-season batting statistics
  • PitchingPost - post-season pitching statistics
  • TeamFranchises - franchise information
  • FieldingOF - outfield position data
  • FieldingPost- post-season fielding data
  • ManagersHalf - split season data for managers
  • TeamsHalf - split season data for teams
  • Salaries - player salary data
  • SeriesPost - post-season series information
  • AwardsManagers - awards won by managers
  • AwardsPlayers - awards won by players
  • AwardsShareManagers - award voting for manager awards
  • AwardsSharePlayers - award voting for player awards
  • Appearances - details on the positions a player appeared at
  • Schools - list of colleges that players attended
  • CollegePlaying - list of players and the colleges they attended

Descriptions of each of these tables can be found attached to their associated files, below.

Acknowledgments

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/

Person identification and demographics data are provided by Chadwick Baseball Bureau (http://www.chadwick-bureau.com), from its Register of baseball personnel.

Player performance data for 1871 through 2014 is based on the Lahman Baseball Database, version 2015-01-24, which is Copyright (C) 1996-2015 by Sean Lahman.

The tables Parks.csv and HomeGames.csv are based on the game logs and park code table published by Retrosheet. This information is available free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.

Search
Clear search
Close search
Google apps
Main menu