Facebook
TwitterjanoPig/pmlb dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterPython wrapper for Penn Machine Learning Benchmark data repository. Large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Part of PyPI https://pypi.org/
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Penn Machine Learning Benchmarks (PMLB) is a large, curated set of benchmark datasets used to evaluate and compare supervised machine learning algorithms. These datasets cover a broad range of applications, and include binary/multi-class classification problems and regression problems, as well as combinations of categorical, ordinal, and continuous features.
Facebook
TwitterThese are the experimental data for the second version (v2) of the paper> Bach, Jakob. "Finding Optimal Diverse Feature Sets with Alternative Feature Selection"
published on arXiv in 2024.
You can find the paper here and the code here.
See the README for details.
The datasets used in our study (which we also provide here) originate from PMLB.
The corresponding GitHub repository is MIT-licensed ((c) 2016 Epistasis Lab at UPenn).
Please see the file LICENSE in the folder datasets/ for the license text.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
michaelmallari/mlb-statcast-pitchers dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Mlb 25 is a dataset for object detection tasks - it contains Baseball annotations for 1,237 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAhead of the 2023 Major League Baseball season, a pitch clock was introduced to speed up the pace of the game. As a result, an average game during the 2024 MLB season lasted 2 hours and 36 minutes. This was more than 25 minutes shorter than an average game during the 2022 season, when the pitch clock had not yet been introduced.
Facebook
TwitterAnnual sponsorship revenue for Major League Baseball teams amounted to 1.9 billion U.S. dollars in 2024. This marked an increase of around 400 million U.S. dollars on the previous year's figure.
Facebook
TwitterThis dataset was created by Diana Bergstrom
data obtained from Baseball Reference
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides statistical ratings and percentile rankings for players from the Negro Leagues and non-Negro League MLB players, focusing on those who met specific game count criteria. It highlights key performance metrics such as batting average, power, speed, defense, and pitching effectiveness, allowing for a comparative analysis between Negro League legends and recognized MLB players.
By normalizing player statistics relative to their league averages and weighting them by Wins Above Replacement (WAR), the dataset offers a comprehensive look at player performance across different eras. It serves as a valuable resource for evaluating the historical impact of Negro League players and understanding their place in baseball history.
Facebook
TwitterMajor League Baseball (MLB) is a professional sports league in North America made up of ** teams that compete in the American League and the National League. In 2023, there were ** female coaches in the league, representing the highest number ever recorded.
Facebook
TwitterAs of November 2024, the player in Major Baseball League (MLB) with the highest cumulative career earnings was Ălex RodrĂguez, with total earnings of over 455 million U.S. dollars. Nicknamed "A-Rod", RodrĂguez played for three different teams during his 22 year career.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MLb-LDLr is a software that predicts the pathogenicity of LDLr variants. It is based on the pathogenicity frequency of more than 700 variants annotated on ClinVar database. This document contains the used ClinVar database, Pik values of each characteristic and the prediction of all the possible LDLr variants
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Mlb Frames With Detections is a dataset for object detection tasks - it contains Mlb Frames With Detections annotations for 5,991 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
bat tracking
Facebook
Twitterfinnnnnnnnnnnn/mlb-play-by-plays-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThese are the experimental data for the paper
Bach, Jakob. "Subgroup Discovery with Small and Alternative Feature Sets"
The paper was accepted at the conference SIGMOD 2025.
You can find the paper here and the code here.
See the README for details.
The datasets used in our study (which we also provide here) originate from PMLB.
The corresponding GitHub repository is MIT-licensed ((c) 2016 Epistasis Lab at UPenn).
Please see the file LICENSE in the folder datasets/ for the license text.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created by Paul Beckman
Released under Attribution 4.0 International (CC BY 4.0)
Facebook
TwitterThis public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterData was provided by the Tampa Bay Rays and collected at Tropicana Field (where the Rays play) by Jordan Davis during the summer of 2021.
Facebook
TwitterjanoPig/pmlb dataset hosted on Hugging Face and contributed by the HF Datasets community