100+ datasets found
  1. h

    pmlb

    • huggingface.co
    Updated Aug 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Pigos (2024). pmlb [Dataset]. https://huggingface.co/datasets/janoPig/pmlb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2024
    Authors
    Jan Pigos
    Description

    janoPig/pmlb dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. r

    Penn machine learning benchmark repository

    • rrid.site
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Penn machine learning benchmark repository [Dataset]. http://identifiers.org/RRID:SCR_017138
    Explore at:
    Dataset updated
    Sep 28, 2025
    Description

    Python wrapper for Penn Machine Learning Benchmark data repository. Large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Part of PyPI https://pypi.org/

  3. O

    PMLB (Penn Machine Learning Benchmarks)

    • opendatalab.com
    zip
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Pennsylvania (2023). PMLB (Penn Machine Learning Benchmarks) [Dataset]. https://opendatalab.com/OpenDataLab/PMLB
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    University of Pennsylvania
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Penn Machine Learning Benchmarks (PMLB) is a large, curated set of benchmark datasets used to evaluate and compare supervised machine learning algorithms. These datasets cover a broad range of applications, and include binary/multi-class classification problems and regression problems, as well as combinations of categorical, ordinal, and continuous features.

  4. k

    Experimental Data for the Paper "Finding Optimal Diverse Feature Sets with...

    • radar.kit.edu
    • radar-service.eu
    tar
    Updated Feb 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakob Bach (2024). Experimental Data for the Paper "Finding Optimal Diverse Feature Sets with Alternative Feature Selection" (Version 2) [Dataset]. http://doi.org/10.35097/1920
    Explore at:
    tar(34243584 bytes)Available download formats
    Dataset updated
    Feb 13, 2024
    Dataset provided by
    Karlsruhe Institute of Technology
    Authors
    Jakob Bach
    Description

    These are the experimental data for the second version (v2) of the paper> Bach, Jakob. "Finding Optimal Diverse Feature Sets with Alternative Feature Selection" published on arXiv in 2024. You can find the paper here and the code here. See the README for details. The datasets used in our study (which we also provide here) originate from PMLB. The corresponding GitHub repository is MIT-licensed ((c) 2016 Epistasis Lab at UPenn). Please see the file LICENSE in the folder datasets/ for the license text.

  5. h

    mlb-statcast-pitchers

    • huggingface.co
    Updated Feb 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Mallari (2024). mlb-statcast-pitchers [Dataset]. https://huggingface.co/datasets/michaelmallari/mlb-statcast-pitchers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Authors
    Michael Mallari
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    michaelmallari/mlb-statcast-pitchers dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. Mlb 25 Dataset

    • universe.roboflow.com
    zip
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mlb (2025). Mlb 25 Dataset [Dataset]. https://universe.roboflow.com/mlb-rrbwp/mlb-25
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 11, 2025
    Dataset provided by
    MLBhttp://mlb.com/
    Authors
    mlb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Baseball Bounding Boxes
    Description

    Mlb 25

    ## Overview
    
    Mlb 25 is a dataset for object detection tasks - it contains Baseball annotations for 1,237 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. MLB average game length 2000-2024

    • statista.com
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Gough (2025). MLB average game length 2000-2024 [Dataset]. https://www.statista.com/topics/968/major-league-baseball/
    Explore at:
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Christina Gough
    Description

    Ahead of the 2023 Major League Baseball season, a pitch clock was introduced to speed up the pace of the game. As a result, an average game during the 2024 MLB season lasted 2 hours and 36 minutes. This was more than 25 minutes shorter than an average game during the 2022 season, when the pitch clock had not yet been introduced.

  8. MLB league sponsorship revenue 2010-2024

    • statista.com
    Updated Apr 25, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2014). MLB league sponsorship revenue 2010-2024 [Dataset]. https://www.statista.com/statistics/380197/mlb-sponsorship-revenue/
    Explore at:
    Dataset updated
    Apr 25, 2014
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Annual sponsorship revenue for Major League Baseball teams amounted to 1.9 billion U.S. dollars in 2024. This marked an increase of around 400 million U.S. dollars on the previous year's figure.

  9. MLB standings

    • kaggle.com
    Updated Aug 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diana Bergstrom (2023). MLB standings [Dataset]. https://www.kaggle.com/datasets/dianabergstrom/mlbstandings/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Diana Bergstrom
    Description

    Dataset

    This dataset was created by Diana Bergstrom

    Contents

    data obtained from Baseball Reference

  10. Negro League & MLB Player Ratings

    • kaggle.com
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nihal Kumar Sharma (2025). Negro League & MLB Player Ratings [Dataset]. https://www.kaggle.com/datasets/nihalkumarsharma/negro-league-and-mlb-player-ratings
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2025
    Dataset provided by
    Kaggle
    Authors
    Nihal Kumar Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides statistical ratings and percentile rankings for players from the Negro Leagues and non-Negro League MLB players, focusing on those who met specific game count criteria. It highlights key performance metrics such as batting average, power, speed, defense, and pitching effectiveness, allowing for a comparative analysis between Negro League legends and recognized MLB players.

    By normalizing player statistics relative to their league averages and weighting them by Wins Above Replacement (WAR), the dataset offers a comprehensive look at player performance across different eras. It serves as a valuable resource for evaluating the historical impact of Negro League players and understanding their place in baseball history.

  11. Female coaches in the MLB from 2011 to 2023

    • statista.com
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Female coaches in the MLB from 2011 to 2023 [Dataset]. https://www.statista.com/statistics/1310420/mlb-female-coaches/
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    North America
    Description

    Major League Baseball (MLB) is a professional sports league in North America made up of ** teams that compete in the American League and the National League. In 2023, there were ** female coaches in the league, representing the highest number ever recorded.

  12. Highest overall career earnings of MLB players in North America as of 2024

    • statista.com
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Gough (2025). Highest overall career earnings of MLB players in North America as of 2024 [Dataset]. https://www.statista.com/topics/968/major-league-baseball/
    Explore at:
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Christina Gough
    Description

    As of November 2024, the player in Major Baseball League (MLB) with the highest cumulative career earnings was Álex Rodríguez, with total earnings of over 455 million U.S. dollars. Nicknamed "A-Rod", Rodríguez played for three different teams during his 22 year career.

  13. f

    MLb-LDLr.xlsx

    • figshare.com
    • research.science.eus
    xlsx
    Updated Jan 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asier Larrea; Cesar Martin Plagaro; Humberto GonzĂĄlez-DĂ­az; Asier Benito-Vicente; shifa jebari; Sonia Arrasate; unai galicia (2021). MLb-LDLr.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.13603991.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 18, 2021
    Dataset provided by
    figshare
    Authors
    Asier Larrea; Cesar Martin Plagaro; Humberto GonzĂĄlez-DĂ­az; Asier Benito-Vicente; shifa jebari; Sonia Arrasate; unai galicia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MLb-LDLr is a software that predicts the pathogenicity of LDLr variants. It is based on the pathogenicity frequency of more than 700 variants annotated on ClinVar database. This document contains the used ClinVar database, Pik values of each characteristic and the prediction of all the possible LDLr variants

  14. R

    Mlb Frames With Detections Dataset

    • universe.roboflow.com
    zip
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Videocites (2025). Mlb Frames With Detections Dataset [Dataset]. https://universe.roboflow.com/videocites-msbrv/mlb-frames-with-detections-flw2y
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Videocites
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Mlb Frames With Detections Bounding Boxes
    Description

    Mlb Frames With Detections

    ## Overview
    
    Mlb Frames With Detections is a dataset for object detection tasks - it contains Mlb Frames With Detections annotations for 5,991 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  15. i

    MLB pitcher data

    • ieee-dataport.org
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wonbyung Lee (2025). MLB pitcher data [Dataset]. https://ieee-dataport.org/documents/mlb-pitcher-data
    Explore at:
    Dataset updated
    Feb 6, 2025
    Authors
    Wonbyung Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    bat tracking

  16. h

    mlb-play-by-plays-v1

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finn Eyles, mlb-play-by-plays-v1 [Dataset]. https://huggingface.co/datasets/finnnnnnnnnnnn/mlb-play-by-plays-v1
    Explore at:
    Authors
    Finn Eyles
    Description

    finnnnnnnnnnnn/mlb-play-by-plays-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. r

    Experimental Data for the Paper "Subgroup Discovery with Small and...

    • resodate.org
    • radar-service.eu
    • +1more
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jakob Bach (2025). Experimental Data for the Paper "Subgroup Discovery with Small and Alternative Feature Sets" [Dataset]. http://doi.org/10.35097/nftgaf7w73hy2491
    Explore at:
    Dataset updated
    Jan 1, 2025
    Dataset provided by
    RADAR
    Karlsruhe Institute of Technology
    Authors
    Jakob Bach
    Description

    These are the experimental data for the paper

    Bach, Jakob. "Subgroup Discovery with Small and Alternative Feature Sets"

    The paper was accepted at the conference SIGMOD 2025. You can find the paper here and the code here. See the README for details.

    The datasets used in our study (which we also provide here) originate from PMLB. The corresponding GitHub repository is MIT-licensed ((c) 2016 Epistasis Lab at UPenn). Please see the file LICENSE in the folder datasets/ for the license text.

  18. MLB dyads every player played WITH since 1914

    • kaggle.com
    Updated Mar 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Beckman (2021). MLB dyads every player played WITH since 1914 [Dataset]. https://www.kaggle.com/paulbeckman/mlb-dyads-every-player-played-with-since-1914/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Paul Beckman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Paul Beckman

    Released under Attribution 4.0 International (CC BY 4.0)

    Contents

  19. MLB 2016 Pitch-by-Pitch

    • console.cloud.google.com
    Updated Jul 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Sportradar (2020). MLB 2016 Pitch-by-Pitch [Dataset]. https://console.cloud.google.com/marketplace/product/sportradar-public-data/mlb-pitch-by-pitch
    Explore at:
    Dataset updated
    Jul 5, 2020
    Dataset provided by
    Googlehttp://google.com/
    Sportradarhttp://sportradar.com/
    Description

    This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016. This dataset contains the following tables: games_wide (every pitch, steal, or lineup event for each at bat in the 2016 regular season), games_post_wide(every pitch, steal, or lineup event for each at-bat in the 2016 post season), and schedules ( the schedule for every team in the regular season). The schemas for the games_wide and games_post_wide tables are identical. With this data you can effectively replay a game and rebuild basic statistics for players and teams. Note: This data was built via a denormalization process over raw game log files which may contain scoring errors and in some cases missing data. For official scoring and statistical information please consult mlb.com , baseball-reference.com , or sportradar.com . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  20. d

    Play Sustainaball: An environmental footprint for an MLB team season

    • datadryad.org
    zip
    Updated Jun 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah Brady; Gabrielle Barsotti; Jordan Davis; Carly Norris; Eric Shaphran (2022). Play Sustainaball: An environmental footprint for an MLB team season [Dataset]. http://doi.org/10.25349/D9RG87
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 8, 2022
    Dataset provided by
    Dryad
    Authors
    Hannah Brady; Gabrielle Barsotti; Jordan Davis; Carly Norris; Eric Shaphran
    Time period covered
    May 6, 2022
    Description

    Data was provided by the Tampa Bay Rays and collected at Tropicana Field (where the Rays play) by Jordan Davis during the summer of 2021.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jan Pigos (2024). pmlb [Dataset]. https://huggingface.co/datasets/janoPig/pmlb

pmlb

janoPig/pmlb

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 31, 2024
Authors
Jan Pigos
Description

janoPig/pmlb dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu