https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. With this data you can effectively replay a game and rebuild basic statistics for players and teams.
games_wide - Every pitch, steal, or lineup event for each at bat in the 2016 regular season.
games_post_wide - Every pitch, steal, or lineup event for each at-bat in the 2016 post season.
schedules - The schedule for every team in the regular season.
*The schemas for the games_wide and games_post_wide tables are identical.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
Dataset Source: Sportradar LLC
Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Baffled why your team traded for that 34-year-old pitcher? Convinced you can create a new and improved version of WAR? Wondering what made the 1907 Cubs great and if can they do it again?
The History of Baseball is a reformatted version of the famous Lahman’s Baseball Database. It contains Major League Baseball’s complete batting and pitching statistics from 1871 to 2015, plus fielding statistics, standings, team stats, park stats, player demographics, managerial records, awards, post-season data, and more.
Scripts, Kaggle’s free, in-browser analytics tool, makes it easy to share detailed sabermetrics, predict the next hall of fame inductee, illustrate how speed scores runs, or publish a definitive analysis on why the Los Angeles Dodgers will never win another World Series.
We have more ideas for analysis than games in a season, but here are a few we’d really love to see:
See the full SQLite schema.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
files: - https://projects.fivethirtyeight.com/mlb-api/mlb_elo.csv
This file contains links to the data behind The Complete History Of MLB and our MLB Predictions.
mlb_elo.csv
contains game-by-game Elo ratings and forecasts back to 1871.
mlb_elo_latest.csv
contains game-by-game Elo ratings and forecasts for only the latest season.
The data contains two separate systems for rating teams; the simpler Elo ratings, used for The Complete History Of MLB, and the more involved — and confusingly named — "ratings" that are used in our MLB Predictions. The main difference is that Elo ratings are reverted to the mean between seasons, while the more involved ratings use preseason team projections from several projection systems and account for starting pitchers. More information can be found in this article.
Column | Definition |
---|---|
date | Date of game |
season | Year of season |
neutral | Whether game was on a neutral site |
playoff | Whether game was in playoffs, and the playoff round if so |
team1 | Abbreviation for home team |
team2 | Abbreviation for away team |
elo1_pre | Home team's Elo rating before the game |
elo2_pre | Away team's Elo rating before the game |
elo_prob1 | Home team's probability of winning according to Elo ratings |
elo_prob2 | Away team's probability of winning according to Elo ratings |
elo1_post | Home team's Elo rating after the game |
elo2_post | Away team's Elo rating after the game |
rating1_pre | Home team's rating before the game |
rating2_pre | Away team's rating before the game |
pitcher1 | Name of home starting pitcher |
pitcher2 | Name of away starting pitcher |
pitcher1_rgs | Home starting pitcher's rolling game score before the game |
pitcher2_rgs | Away starting pitcher's rolling game score before the game |
pitcher1_adj | Home starting pitcher's adjustment to their team's rating |
pitcher2_adj | Away starting pitcher's adjustment to their team's rating |
rating_prob1 | Home team's probability of winning according to team ratings and starting pitchers |
rating_prob2 | Away team's probability of winning according to team ratings and starting pitchers |
rating1_post | Home team's rating after the game |
rating2_post | Away team's rating after the game |
score1 | Home team's score |
score2 | Away team's score |
This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!
This dataset is maintained using GitHub's API and Kaggle's API.
This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
The Major League Soccer Union releases the salaries of every MLS player each year. This is a collection of salaries from 2007 to 2017.
Each file contains the following fields:
Jeremy Singer-Vine over at Data is Plural scraped the PDF's released by the MLS Union and put the data in a nice little package of CSV files for everyone.
I downloaded this dataset from: https://github.com/data-is-plural/mls-salaries MIT License
Who in the MLS makes the most money? Are they worth it? I make about $900 bazillion each year, can I afford a soccer team?
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. With this data you can effectively replay a game and rebuild basic statistics for players and teams.
games_wide - Every pitch, steal, or lineup event for each at bat in the 2016 regular season.
games_post_wide - Every pitch, steal, or lineup event for each at-bat in the 2016 post season.
schedules - The schedule for every team in the regular season.
*The schemas for the games_wide and games_post_wide tables are identical.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]
. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
Dataset Source: Sportradar LLC
Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.