16 datasets found
  1. March Madness | Historical Data | 2012-2023

    • kaggle.com
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arnav Samal (2024). March Madness | Historical Data | 2012-2023 [Dataset]. https://www.kaggle.com/datasets/arnavs19/march-madness-historical-data-2012-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arnav Samal
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This comprehensive CSV dataset compiles historical features of NCAA basketball teams participating in March Madness tournaments from 2012 to 2023. The dataset includes a rich array of performance metrics aimed at analyzing team dynamics and competitiveness. Key features encompass win-loss percentage, advanced metrics like Simple Rating System (SRS), Strength of Schedule (SOS), field goal percentage (FG%), three-point percentage (3P%), free throw percentage (FT%), home and away win rates, conference win rates, and point differential percentage.

    Additionally, advanced statistical insights are provided, such as adjusted efficiency margin (AdjEM), adjusted offensive efficiency (AdjO), adjusted defensive efficiency (AdjD), adjusted tempo (AdjT), luck factor, adjusted strength of schedule (SOS AdjEM), average adjusted offensive efficiency of opposing teams (OppO), average adjusted defensive efficiency of opposing teams (OppD), and non-conference adjusted strength of schedule (NCSOS AdjEM). This dataset serves as a valuable resource for researchers, analysts, and enthusiasts seeking to delve into the intricate performance dynamics of collegiate basketball teams during the March Madness era.

  2. March Madness Historical DataSet (2002 to 2025)

    • kaggle.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonathan Pilafas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

    This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

    Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

    These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

    This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

  3. d

    March Madness Predictions

    • datahub.io
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). March Madness Predictions [Dataset]. https://datahub.io/core/five-thirty-eight-datasets/datasets/march-madness-predictions
    Explore at:
    Dataset updated
    Sep 25, 2024
    Description

    This folder contains data behind the 2014 NCAA Tournament Predictions.

    This dataset was scraped from FiveThirtyEight - march-madness-predictions ...

  4. college basketball march madness data

    • kaggle.com
    Updated May 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alec Bensman (2022). college basketball march madness data [Dataset]. https://www.kaggle.com/alecbensman/college-basketball-march-madness-data/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alec Bensman
    Description

    Data taken from https://www.kaggle.com/datasets/andrewsundberg/college-basketball-dataset and updated with data from https://barttorvik.com/

    TEAM: The Division I college basketball school

    CONF: The Athletic Conference in which the school participates in (A10 = Atlantic 10, ACC = Atlantic Coast Conference, AE = America East, Amer = American, ASun = ASUN, B10 = Big Ten, B12 = Big 12, BE = Big East, BSky = Big Sky, BSth = Big South, BW = Big West, CAA = Colonial Athletic Association, CUSA = Conference USA, Horz = Horizon League, Ivy = Ivy League, MAAC = Metro Atlantic Athletic Conference, MAC = Mid-American Conference, MEAC = Mid-Eastern Athletic Conference, MVC = Missouri Valley Conference, MWC = Mountain West, NEC = Northeast Conference, OVC = Ohio Valley Conference, P12 = Pac-12, Pat = Patriot League, SB = Sun Belt, SC = Southern Conference, SEC = South Eastern Conference, Slnd = Southland Conference, Sum = Summit League, SWAC = Southwestern Athletic Conference, WAC = Western Athletic Conference, WCC = West Coast Conference)

    G: Number of games played

    W: Number of games won

    ADJOE: Adjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)

    ADJDE: Adjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)

    BARTHAG: Power Rating (Chance of beating an average Division I team)

    EFG_O: Effective Field Goal Percentage Shot

    EFG_D: Effective Field Goal Percentage Allowed

    TOR: Turnover Percentage Allowed (Turnover Rate)

    TORD: Turnover Percentage Committed (Steal Rate)

    ORB: Offensive Rebound Rate

    DRB: Offensive Rebound Rate Allowed

    FTR : Free Throw Rate (How often the given team shoots Free Throws)

    FTRD: Free Throw Rate Allowed

    2P_O: Two-Point Shooting Percentage

    2P_D: Two-Point Shooting Percentage Allowed

    3P_O: Three-Point Shooting Percentage

    3P_D: Three-Point Shooting Percentage Allowed

    ADJ_T: Adjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)

    WAB: Wins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)

    POSTSEASON: Round where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)

    SEED: Seed in the NCAA March Madness Tournament

    YEAR: Season

  5. NCAA Men's March Madness average TV viewership in the U.S. 2025

    • statista.com
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). NCAA Men's March Madness average TV viewership in the U.S. 2025 [Dataset]. https://www.statista.com/statistics/251560/ncaa-basketball-march-madness-average-tv-viewership-per-game/
    Explore at:
    Dataset updated
    Jun 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    During the 2025 edition of the NCAA Division I Men's Basketball Championship, the average TV viewership in the United States stood at **** million viewers. This represented an increase of ***** percent from the previous year.

  6. A

    ‘March Madness 2018’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Mar 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘March Madness 2018’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-march-madness-2018-6118/e0be253b/?iid=013-045&v=presentation
    Explore at:
    Dataset updated
    Mar 15, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘March Madness 2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/march-madness-2018e on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    This file contains links to the data behind our 2018 March Madness Predictions.

    fivethirtyeight_ncaa_forecasts.csv contains power ratings for each team and the chance of each team reaching every round of the tournament. It includes men's and women's forecasts, with one forecast for each day of the tournament.

    Source: https://github.com/fivethirtyeight/data/tree/master/march-madness-predictions-2018

    This dataset was created by FiveThirtyEight and contains around 600 samples along with Rd1 Win, Rd7 Win, technical information and other features such as: - Team Id - Playin Flag - and more.

    How to use this dataset

    • Analyze Team Region in relation to Team Name
    • Study the influence of Gender on Rd5 Win
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  7. Supplemental March Madness Data

    • kaggle.com
    Updated Mar 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Pochyly (2018). Supplemental March Madness Data [Dataset]. https://www.kaggle.com/sampocs/supplemental-march-madness-data/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sam Pochyly
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Sam Pochyly

    Released under CC0: Public Domain

    Contents

  8. 2016 March ML Mania Predictions

    • kaggle.com
    zip
    Updated Nov 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Cukierski (2017). 2016 March ML Mania Predictions [Dataset]. https://www.kaggle.com/datasets/wcukierski/2016-march-ml-mania
    Explore at:
    zip(28950066 bytes)Available download formats
    Dataset updated
    Nov 15, 2017
    Authors
    Will Cukierski
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Kaggle’s March Machine Learning Mania competition challenged data scientists to predict winners and losers of the men's 2016 NCAA basketball tournament. This dataset contains the 1070 selected predictions of all Kaggle participants. These predictions were collected and locked in prior to the start of the tournament.

    How can this data be used? You can pivot it to look at both Kaggle and NCAA teams alike. You can look at who will win games, which games will be close, which games are hardest to forecast, or which Kaggle teams are gambling vs. sticking to the data.

    First round predictions

    The NCAA tournament is a single-elimination tournament that begins with 68 teams. There are four games, usually called the “play-in round,” before the traditional bracket action starts. Due to competition timing, these games are included in the prediction files but should not be used in analysis, as it’s possible that the prediction was submitted after the play-in round games were over.

    Data Description

    Each Kaggle team could submit up to two prediction files. The prediction files in the dataset are in the 'predictions' folder and named according to:

    TeamName_TeamId_SubmissionId.csv

    The file format contains a probability prediction for every possible game between the 68 teams. This is necessary to cover every possible tournament outcome. Each team has a unique numerical Id (given in Teams.csv). Each game has a unique Id column created by concatenating the year and the two team Ids. The format is the following:

    Id,Pred
    2016_1112_1114,0.6
    2016_1112_1122,0
    ...

    The team with the lower numerical Id is always listed first. “Pred” represents the probability that the team with the lower Id beats the team with the higher Id. For example, "2016_1112_1114,0.6" indicates team 1112 has a 0.6 probability of beating team 1114.

    For convenience, we have included the data files from the 2016 March Mania competition dataset in the Scripts environment (you may find TourneySlots.csv and TourneySeeds.csv useful for determining matchups, see the documentation). However, the focus of this dataset is on Kagglers' predictions.

  9. A

    ‘College Basketball Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘College Basketball Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-college-basketball-dataset-ad1b/defeb915/?iid=015-917&v=presentation
    Explore at:
    Dataset updated
    Nov 19, 2019
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘College Basketball Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/andrewsundberg/college-basketball-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Content

    Data from the 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, and 2021 Division I college basketball seasons.

    cbb.csv has seasons 2013-2019 combined

    The 2020 season's data set is kept separate from the other seasons, because there was no postseason due to the Coronavirus.

    The 2021 data is from 3/15/2021 and will be updated and added to cbb.csv after the tournament

    Variables

    RK (Only in cbb20): The ranking of the team at the end of the regular season according to barttorvik

    TEAM: The Division I college basketball school

    CONF: The Athletic Conference in which the school participates in (A10 = Atlantic 10, ACC = Atlantic Coast Conference, AE = America East, Amer = American, ASun = ASUN, B10 = Big Ten, B12 = Big 12, BE = Big East, BSky = Big Sky, BSth = Big South, BW = Big West, CAA = Colonial Athletic Association, CUSA = Conference USA, Horz = Horizon League, Ivy = Ivy League, MAAC = Metro Atlantic Athletic Conference, MAC = Mid-American Conference, MEAC = Mid-Eastern Athletic Conference, MVC = Missouri Valley Conference, MWC = Mountain West, NEC = Northeast Conference, OVC = Ohio Valley Conference, P12 = Pac-12, Pat = Patriot League, SB = Sun Belt, SC = Southern Conference, SEC = South Eastern Conference, Slnd = Southland Conference, Sum = Summit League, SWAC = Southwestern Athletic Conference, WAC = Western Athletic Conference, WCC = West Coast Conference)

    G: Number of games played

    W: Number of games won

    ADJOE: Adjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)

    ADJDE: Adjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)

    BARTHAG: Power Rating (Chance of beating an average Division I team)

    EFG_O: Effective Field Goal Percentage Shot

    EFG_D: Effective Field Goal Percentage Allowed

    TOR: Turnover Percentage Allowed (Turnover Rate)

    TORD: Turnover Percentage Committed (Steal Rate)

    ORB: Offensive Rebound Rate

    DRB: Offensive Rebound Rate Allowed

    FTR : Free Throw Rate (How often the given team shoots Free Throws)

    FTRD: Free Throw Rate Allowed

    2P_O: Two-Point Shooting Percentage

    2P_D: Two-Point Shooting Percentage Allowed

    3P_O: Three-Point Shooting Percentage

    3P_D: Three-Point Shooting Percentage Allowed

    ADJ_T: Adjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)

    WAB: Wins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)

    POSTSEASON: Round where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)

    SEED: Seed in the NCAA March Madness Tournament

    YEAR: Season

    Acknowledgements

    This data was scraped from from http://barttorvik.com/trank.php#. I cleaned the data set and added the POSTSEASON, SEED, and YEAR columns

    --- Original source retains full ownership of the source dataset ---

  10. KenPom2022

    • kaggle.com
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wilmer E. Henao (2023). KenPom2022 [Dataset]. https://www.kaggle.com/verracodeguacas/kenpom2022/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wilmer E. Henao
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Update

    Updated the unbiased data up to selection Sunday 2022

    Context

    This data contains two csv files. One of them is guaranteed to have no leakage. The problem with it is that the data only starts after 2010. The other file goes back to 2001, but contains some leakage.

    Content

    The data was acquired from Ken Pom's official website (leaky data) and from time machine services for the unleaky version.

  11. h

    nq_gar-t5_expansions

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Castorini, nq_gar-t5_expansions [Dataset]. https://huggingface.co/datasets/castorini/nq_gar-t5_expansions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Castorini
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary

    The repo provides answer, title and sentence expansions for the Natural Questions corpus with gar-T5.

      Dataset Structure
    

    There are dev and test folds An example data entry of the dev split looks as follows: { "id": "1", "predicted_answers": ["312"], "predicted_titles": ["Invisible Man"], "predicted_sentences": ["The Invisible Man First edition Author Ralph Ellison Cover artist M."] }

    An example data entry of the test split looks as follows: {… See the full description on the dataset page: https://huggingface.co/datasets/castorini/nq_gar-t5_expansions.

  12. March Madness Augmented Statistics

    • kaggle.com
    Updated Apr 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Siles (2021). March Madness Augmented Statistics [Dataset]. https://www.kaggle.com/colinsiles/march-madness-augmented-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2021
    Dataset provided by
    Kaggle
    Authors
    Colin Siles
    Description

    Context

    A team's mean seasons statistics can be used as predictors for their performance in future games. However, these statistics gain additional meaning when placed in the context of their opponents' (and opponents' opponents') performance. This dataset provides this context for each team. Furthermore, predicting games based on post-season stats causes data leakage, which from experience can be significant in this context (15-20% loss in accuracy). Thus, this dataset provides each of these statistics prior to each game of the regular season, preventing any source of data leakage.

    Content

    All data is derived from the March Madness competition data. Each original column was renamed to "A" and "B" instead of "W" and "L," and the mirrored to represent both orderings of opponents. Each team's mean stats are computed (both their stats, and the mean "allowed" or "forced" statistics by their opponents). To compute the mean opponents' stats, we analyze the games played by each opponent (excluding games played against the team in question), and compute the mean statistics for those games. We then compute the mean of these mean statistics, weighted by the number of times the team in question played each opponent. The opponents' opponent's stats are computed as a weighted average of the opponents' average. This results in statistics similar to those used to compute strength of schedule or RPI, just that they go beyond win percentages (See: https://en.wikipedia.org/wiki/Rating_percentage_index)

    The per game statistics are computed by pretending we don't have any of the data on or after the day in question.

    Next Steps

    Currently, the data isn't computed particularly efficiently. Computing the per game averages for every day of the season is necessary to compute fully accurate opponents' opponents' average, but takes about 90 minutes to obtain. It is probably possible to parallelize this, and the per-game averages involve a lot of repeated computation (basically computing the final averages over and over again for each day). Speeding this up will make it more convenient to make changes to the dataset.

    I would like to transform these statistics to be per-possession, add shooting percentages, pace, and number of games played (to give an idea of the amount uncertainty that exists in the per-game averages). Some of these can be approximated with the given data (but the results won't be exact), while others will need to be computed from scratch.

  13. NCAA March Madness 2020 Mens

    • kaggle.com
    Updated Feb 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    corochann (2020). NCAA March Madness 2020 Mens [Dataset]. https://www.kaggle.com/corochann/ncaa-march-madness-2020-mens/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    corochann
    Description

    This is feather format data of the compeition Google Cloud & NCAA® ML Competition 2020-NCAAM. Please refer the kernel 2020 NCAAM: Fast data loading with feather for usage.

    Cover photo from pexels.

  14. 2017 March ML Mania Predictions

    • kaggle.com
    Updated Mar 16, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Cukierski (2017). 2017 March ML Mania Predictions [Dataset]. https://www.kaggle.com/datasets/wcukierski/2017-march-ml-mania-predictions/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Will Cukierski
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Kaggle’s March Machine Learning Mania competition challenged data scientists to predict winners and losers of the men's 2017 NCAA basketball tournament. This dataset contains the selected predictions of all Kaggle participants. These predictions were collected and locked in prior to the start of the tournament.

    The NCAA tournament is a single-elimination tournament that begins with 68 teams. There are four games, usually called the “play-in round,” before the traditional bracket action starts. Due to competition timing, these games are included in the prediction files but should not be used in analysis, as it’s possible that the prediction was submitted after the play-in round games were over.

    Data Description

    Each Kaggle team could submit up to two prediction files. The prediction files in the dataset are in the 'predictions' folder. You can map the files to the teams by team_submission_key.csv.

    The submission format contains a probability prediction for every possible game between the 68 teams. Refer to the competition documentation for data details. For convenience, we have included the data files from the competition dataset in the dataset (you may find TourneySlots.csv and TourneySeeds.csv useful for determining matchups). However, the focus of this dataset is on Kagglers' predictions.

  15. World Soccer live data feed

    • kaggle.com
    Updated Jan 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Ghahramani (2019). World Soccer live data feed [Dataset]. https://www.kaggle.com/datasets/analystmasters/world-soccer-live-data-feed/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohammad Ghahramani
    Description

    Context

    This is the first live data stream on Kaggle providing a simple yet rich source of all soccer matches around the world 24/7 in real-time.

    What makes it unique compared to other datasets?

    • It is the first live data feed on Kaggle and it is totally free
    • Unlike “Churn rate” datasets you do not have to wait months to evaluate your predictions; simply check the match’s outcome in a couple of hours
    • you can use your predictions/analysis for your own benefit instead of spending your time and resources on helping a company maximizing its profit
    • A Five year old laptop can do the calculations and you do not need high-end GPUs
    • Couldn’t make it to the top 3 submissions? Nevermind, you still have the chance to get your prize on your own
    • You can’t get accurate results on all samples? Do not worry, just filter out the hard ones (e.g. ignore international friendly) and simply choose the ones you are sure of.
    • Need help from human experts for each sample? Every sample comes with at least two opinions from experts
    • You wish you could add your complementary data? Just contact us and we will try to facilitate it.
    • Couldn’t win “Warren Buffett's 2018 March Madness Bracket Contest”? Here is your chance to make your accumulative profit.

    Simply train your algorithm on the first version of training dataset of approximately 11.5k matches and predict the data provided in the following data feed.

    Fetch the data stream

    The CSV file is updated every 30 minutes at minutes 20’ and 50’ of every hour. I kindly request not to download it more than twice per hour as it incurs additional cost.

    You may download the csv data file from the following link from Amazon S3 server by changing the FOLDER_NAME as below,

    https://s3.amazonaws.com/FOLDER_NAME/amasters.csv

    *. Substitute the FOLDER_NAME with "**analyst-masters**"

    Content

    Our goal is to identify the outcome of a match as Home, Draw or Away. The variety of sources and nature of information provided in this data stream makes it a unique database. Currently, FIVE servers are collecting data from soccer matches around the world, communicating with each other and finally aggregating the data based on the dominant features learned from 400,000 matches over 7 years. I describe every column and the data collection below in two categories, Category I – Current situation and Category II – Head-to-Head History. Hence, we divide the type of data we have from each team to 4 modes,

    • Mode 1: we have both Category I and Category II available
    • Mode 2: we only have Category I available
    • Mode 3: we only have Category II available
    • Mode 4: none of Category I and II are available

    Below you can find a full illustration of each category.

    I. Current situation

    Col 1 to 3:

    Votes_for_Home Votes_for_Draw Votes_for_Away
    

    The most distinctive parts of the database are these 3 columns. We are releasing opinions of over 100 professional soccer analysts predicting the outcome of a match. Their votes is the result of every piece of information they receive on players, team line-up, injuries and the urge of a team to win a match to stay in the league. They are spread around the world in various time zones and are experts on soccer teams from various regions. Our servers aggregate their opinions to update the CSV file until kickoff. Therefore, even if 40 users predict Real-Madrid wins against Real-Sociedad in Santiago Bernabeu on January 6th, 2019 but 5 users predict Real-Sociedad (the away team) will be the winner, you should doubt the home win. Here, the “majority of votes” works in conjunction with other features.

    Col 4 to 9:

    Weekday Day Month  Year  Hour  Minute
    

    There are over 60,000 matches during a year, and approximately 400 ones are usually held per day on weekends. More critical and exciting matches, which are usually less predictable, are held toward the evening in Europe. We are currently providing time in Central Europe Time (CET) equivalent to GMT +01:00.

    *. Please note that the 2nd row of the CSV file represents the time, data values are saved from all servers to the file.

    Col 10 to 13:

    Total_Bettors   Bet_Perc_on_Home    Bet_Perc_on_Draw   Bet_Perc_on_Away
    

    This data is recorded a few hours before the match as people place bets emotionally when kickoff approaches. The percentage of the overall number of people denoted as “Total_Bettors” is indicated in each column for “Home,” “Draw” and “Away” outcomes.

    Col 14 to 15:

    Team_1 Team_2   
    

    The team playing “Home” is “Team_1” and the opponent playing “Away” is “Team_2”.

    Col 16 to 36:

    League_Rank_1  League_Rank_2  Total_teams     Points_1  Points_2  Max_points Min_points Won_1  Draw_1 Lost_1 Won_2  Draw_2 Lost_2 Goals_Scored_1 Goals_Scored_2 Goals_Rec_1 Goal_Rec_2 Goals_Diff_1  Goals_Diff_2
    

    If the match is betw...

  16. womens-leaderboard-analyzer-app-2021

    • kaggle.com
    Updated Mar 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Motoki (2021). womens-leaderboard-analyzer-app-2021 [Dataset]. https://www.kaggle.com/mmotoki/womens-leaderboard-analyzer-app-2021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 3, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Matt Motoki
    Description

    Women's March Madness Leaderboard Analyzer App

    Summary

    The app allows you to upload a submission and analyze how well you would have done in previous years’ competitions. * The Public leaderboard is usually full of leaky submissions making it hard to determine the quality of a submission. The Public leaderboard is included here for comparison. It is updated everytime the app is run. * The Average leaderboard shows the average score of the nth place teams. For example, if your submission places 10th on the Average leaderboard then your score is slightly better the average of the 10th place teams in the previous competitions and slightly worse than the average of the 9th place teams in the previous competitions. * The 2018 - 2019 leaderboards are exact copys from previous competitions.You can use them to view where your submission would have placed in those competitions.

    Run on the App on Kaggle

    Fork and edit the Women's March Madness 2021 Leaderboard Analyzer on Kaggle. Run all cells of the notebook and view the app in a separate tab using the url generated by ngrok.

    Important: The app needs a backend to run. You must fork and edit the notebook. You won't be able to view the app from a static Kaggle notebook.

    Local Installation

    1. Run the Wave Server

    Follow the instructions here to download and run the latest Wave Server, a requirement for apps. Note: If you have a version of Wave older than or equal to 0.12.0, you will need to reinstall Wave with a newer version.

    2. Download the App

    Download the app code from kaggle. Open a terminal in the downloaded womens_leaderboard directory and create a tmp folder for uploded files. bash $ mkdir tmp

    3. Setup Your Python Environment

    $ make setup
    $ source venv/bin/activate
    

    4. Run the App

    $ wave run leaderboard.app
    

    5. View the App

    Point your favorite web browser to localhost:10101

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Arnav Samal (2024). March Madness | Historical Data | 2012-2023 [Dataset]. https://www.kaggle.com/datasets/arnavs19/march-madness-historical-data-2012-2023
Organization logo

March Madness | Historical Data | 2012-2023

March Madness Mayhem: A Decade of Dunking, Upsets, and Championship Glory

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arnav Samal
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This comprehensive CSV dataset compiles historical features of NCAA basketball teams participating in March Madness tournaments from 2012 to 2023. The dataset includes a rich array of performance metrics aimed at analyzing team dynamics and competitiveness. Key features encompass win-loss percentage, advanced metrics like Simple Rating System (SRS), Strength of Schedule (SOS), field goal percentage (FG%), three-point percentage (3P%), free throw percentage (FT%), home and away win rates, conference win rates, and point differential percentage.

Additionally, advanced statistical insights are provided, such as adjusted efficiency margin (AdjEM), adjusted offensive efficiency (AdjO), adjusted defensive efficiency (AdjD), adjusted tempo (AdjT), luck factor, adjusted strength of schedule (SOS AdjEM), average adjusted offensive efficiency of opposing teams (OppO), average adjusted defensive efficiency of opposing teams (OppD), and non-conference adjusted strength of schedule (NCSOS AdjEM). This dataset serves as a valuable resource for researchers, analysts, and enthusiasts seeking to delve into the intricate performance dynamics of collegiate basketball teams during the March Madness era.

Search
Clear search
Close search
Google apps
Main menu