16 datasets found

March Madness | Historical Data | 2012-2023
kaggle.com
Updated Mar 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnav Samal (2024). March Madness | Historical Data | 2012-2023 [Dataset]. https://www.kaggle.com/datasets/arnavs19/march-madness-historical-data-2012-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arnav Samal
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This comprehensive CSV dataset compiles historical features of NCAA basketball teams participating in March Madness tournaments from 2012 to 2023. The dataset includes a rich array of performance metrics aimed at analyzing team dynamics and competitiveness. Key features encompass win-loss percentage, advanced metrics like Simple Rating System (SRS), Strength of Schedule (SOS), field goal percentage (FG%), three-point percentage (3P%), free throw percentage (FT%), home and away win rates, conference win rates, and point differential percentage.

Additionally, advanced statistical insights are provided, such as adjusted efficiency margin (AdjEM), adjusted offensive efficiency (AdjO), adjusted defensive efficiency (AdjD), adjusted tempo (AdjT), luck factor, adjusted strength of schedule (SOS AdjEM), average adjusted offensive efficiency of opposing teams (OppO), average adjusted defensive efficiency of opposing teams (OppD), and non-conference adjusted strength of schedule (NCSOS AdjEM). This dataset serves as a valuable resource for researchers, analysts, and enthusiasts seeking to delve into the intricate performance dynamics of collegiate basketball teams during the March Madness era.
March Madness Historical DataSet (2002 to 2025)
kaggle.com
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jonathan Pilafas
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
d
March Madness Predictions
datahub.io
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). March Madness Predictions [Dataset]. https://datahub.io/core/five-thirty-eight-datasets/datasets/march-madness-predictions
Explore at:
Dataset updated
Sep 25, 2024
Description
This folder contains data behind the 2014 NCAA Tournament Predictions.

This dataset was scraped from FiveThirtyEight - march-madness-predictions ...
college basketball march madness data
kaggle.com
Updated May 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alec Bensman (2022). college basketball march madness data [Dataset]. https://www.kaggle.com/alecbensman/college-basketball-march-madness-data/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 24, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alec Bensman
Description
Data taken from https://www.kaggle.com/datasets/andrewsundberg/college-basketball-dataset and updated with data from https://barttorvik.com/

TEAM: The Division I college basketball school

CONF: The Athletic Conference in which the school participates in (A10 = Atlantic 10, ACC = Atlantic Coast Conference, AE = America East, Amer = American, ASun = ASUN, B10 = Big Ten, B12 = Big 12, BE = Big East, BSky = Big Sky, BSth = Big South, BW = Big West, CAA = Colonial Athletic Association, CUSA = Conference USA, Horz = Horizon League, Ivy = Ivy League, MAAC = Metro Atlantic Athletic Conference, MAC = Mid-American Conference, MEAC = Mid-Eastern Athletic Conference, MVC = Missouri Valley Conference, MWC = Mountain West, NEC = Northeast Conference, OVC = Ohio Valley Conference, P12 = Pac-12, Pat = Patriot League, SB = Sun Belt, SC = Southern Conference, SEC = South Eastern Conference, Slnd = Southland Conference, Sum = Summit League, SWAC = Southwestern Athletic Conference, WAC = Western Athletic Conference, WCC = West Coast Conference)

G: Number of games played

W: Number of games won

ADJOE: Adjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)

ADJDE: Adjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)

BARTHAG: Power Rating (Chance of beating an average Division I team)

EFG_O: Effective Field Goal Percentage Shot

EFG_D: Effective Field Goal Percentage Allowed

TOR: Turnover Percentage Allowed (Turnover Rate)

TORD: Turnover Percentage Committed (Steal Rate)

ORB: Offensive Rebound Rate

DRB: Offensive Rebound Rate Allowed

FTR : Free Throw Rate (How often the given team shoots Free Throws)

FTRD: Free Throw Rate Allowed

2P_O: Two-Point Shooting Percentage

2P_D: Two-Point Shooting Percentage Allowed

3P_O: Three-Point Shooting Percentage

3P_D: Three-Point Shooting Percentage Allowed

ADJ_T: Adjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)

WAB: Wins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)

POSTSEASON: Round where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)

SEED: Seed in the NCAA March Madness Tournament

YEAR: Season
NCAA Men's March Madness average TV viewership in the U.S. 2025
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). NCAA Men's March Madness average TV viewership in the U.S. 2025 [Dataset]. https://www.statista.com/statistics/251560/ncaa-basketball-march-madness-average-tv-viewership-per-game/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
During the 2025 edition of the NCAA Division I Men's Basketball Championship, the average TV viewership in the United States stood at **** million viewers. This represented an increase of ***** percent from the previous year.
A
‘March Madness 2018’ analyzed by Analyst-2
analyst-2.ai
Updated Mar 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘March Madness 2018’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-march-madness-2018-6118/e0be253b/?iid=013-045&v=presentation
Explore at:
Dataset updated
Mar 15, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘March Madness 2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/march-madness-2018e on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

This file contains links to the data behind our 2018 March Madness Predictions.

fivethirtyeight_ncaa_forecasts.csv contains power ratings for each team and the chance of each team reaching every round of the tournament. It includes men's and women's forecasts, with one forecast for each day of the tournament.

Source: https://github.com/fivethirtyeight/data/tree/master/march-madness-predictions-2018

This dataset was created by FiveThirtyEight and contains around 600 samples along with Rd1 Win, Rd7 Win, technical information and other features such as: - Team Id - Playin Flag - and more.

How to use this dataset

Analyze Team Region in relation to Team Name

Study the influence of Gender on Rd5 Win

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Supplemental March Madness Data
kaggle.com
Updated Mar 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam Pochyly (2018). Supplemental March Madness Data [Dataset]. https://www.kaggle.com/sampocs/supplemental-march-madness-data/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 18, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sam Pochyly
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Sam Pochyly

Released under CC0: Public Domain

Contents
2016 March ML Mania Predictions
kaggle.com
zip
Updated Nov 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Cukierski (2017). 2016 March ML Mania Predictions [Dataset]. https://www.kaggle.com/datasets/wcukierski/2016-march-ml-mania
Explore at:
zip(28950066 bytes)Available download formats
Dataset updated
Nov 15, 2017
Authors
Will Cukierski
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Kaggle’s March Machine Learning Mania competition challenged data scientists to predict winners and losers of the men's 2016 NCAA basketball tournament. This dataset contains the 1070 selected predictions of all Kaggle participants. These predictions were collected and locked in prior to the start of the tournament.

How can this data be used? You can pivot it to look at both Kaggle and NCAA teams alike. You can look at who will win games, which games will be close, which games are hardest to forecast, or which Kaggle teams are gambling vs. sticking to the data.

The NCAA tournament is a single-elimination tournament that begins with 68 teams. There are four games, usually called the “play-in round,” before the traditional bracket action starts. Due to competition timing, these games are included in the prediction files but should not be used in analysis, as it’s possible that the prediction was submitted after the play-in round games were over.

Data Description

Each Kaggle team could submit up to two prediction files. The prediction files in the dataset are in the 'predictions' folder and named according to:

TeamName_TeamId_SubmissionId.csv

The file format contains a probability prediction for every possible game between the 68 teams. This is necessary to cover every possible tournament outcome. Each team has a unique numerical Id (given in Teams.csv). Each game has a unique Id column created by concatenating the year and the two team Ids. The format is the following:

Id,Pred
2016_1112_1114,0.6
2016_1112_1122,0
...

The team with the lower numerical Id is always listed first. “Pred” represents the probability that the team with the lower Id beats the team with the higher Id. For example, "2016_1112_1114,0.6" indicates team 1112 has a 0.6 probability of beating team 1114.

For convenience, we have included the data files from the 2016 March Mania competition dataset in the Scripts environment (you may find TourneySlots.csv and TourneySeeds.csv useful for determining matchups, see the documentation). However, the focus of this dataset is on Kagglers' predictions.
A
‘College Basketball Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Nov 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘College Basketball Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-college-basketball-dataset-ad1b/defeb915/?iid=015-917&v=presentation
Explore at:
Dataset updated
Nov 19, 2019
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘College Basketball Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/andrewsundberg/college-basketball-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Content

Data from the 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, and 2021 Division I college basketball seasons.

cbb.csv has seasons 2013-2019 combined

The 2020 season's data set is kept separate from the other seasons, because there was no postseason due to the Coronavirus.

The 2021 data is from 3/15/2021 and will be updated and added to cbb.csv after the tournament

Variables

RK (Only in cbb20): The ranking of the team at the end of the regular season according to barttorvik

TEAM: The Division I college basketball school

CONF: The Athletic Conference in which the school participates in (A10 = Atlantic 10, ACC = Atlantic Coast Conference, AE = America East, Amer = American, ASun = ASUN, B10 = Big Ten, B12 = Big 12, BE = Big East, BSky = Big Sky, BSth = Big South, BW = Big West, CAA = Colonial Athletic Association, CUSA = Conference USA, Horz = Horizon League, Ivy = Ivy League, MAAC = Metro Atlantic Athletic Conference, MAC = Mid-American Conference, MEAC = Mid-Eastern Athletic Conference, MVC = Missouri Valley Conference, MWC = Mountain West, NEC = Northeast Conference, OVC = Ohio Valley Conference, P12 = Pac-12, Pat = Patriot League, SB = Sun Belt, SC = Southern Conference, SEC = South Eastern Conference, Slnd = Southland Conference, Sum = Summit League, SWAC = Southwestern Athletic Conference, WAC = Western Athletic Conference, WCC = West Coast Conference)

G: Number of games played

W: Number of games won

ADJOE: Adjusted Offensive Efficiency (An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average Division I defense)

ADJDE: Adjusted Defensive Efficiency (An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average Division I offense)

BARTHAG: Power Rating (Chance of beating an average Division I team)

EFG_O: Effective Field Goal Percentage Shot

EFG_D: Effective Field Goal Percentage Allowed

TOR: Turnover Percentage Allowed (Turnover Rate)

TORD: Turnover Percentage Committed (Steal Rate)

ORB: Offensive Rebound Rate

DRB: Offensive Rebound Rate Allowed

FTR : Free Throw Rate (How often the given team shoots Free Throws)

FTRD: Free Throw Rate Allowed

2P_O: Two-Point Shooting Percentage

2P_D: Two-Point Shooting Percentage Allowed

3P_O: Three-Point Shooting Percentage

3P_D: Three-Point Shooting Percentage Allowed

ADJ_T: Adjusted Tempo (An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average Division I tempo)

WAB: Wins Above Bubble (The bubble refers to the cut off between making the NCAA March Madness Tournament and not making it)

POSTSEASON: Round where the given team was eliminated or where their season ended (R68 = First Four, R64 = Round of 64, R32 = Round of 32, S16 = Sweet Sixteen, E8 = Elite Eight, F4 = Final Four, 2ND = Runner-up, Champion = Winner of the NCAA March Madness Tournament for that given year)

SEED: Seed in the NCAA March Madness Tournament

YEAR: Season

Acknowledgements

This data was scraped from from http://barttorvik.com/trank.php#. I cleaned the data set and added the POSTSEASON, SEED, and YEAR columns

--- Original source retains full ownership of the source dataset ---
KenPom2022
kaggle.com
Updated Feb 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilmer E. Henao (2023). KenPom2022 [Dataset]. https://www.kaggle.com/verracodeguacas/kenpom2022/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wilmer E. Henao
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Update

Updated the unbiased data up to selection Sunday 2022

Context

This data contains two csv files. One of them is guaranteed to have no leakage. The problem with it is that the data only starts after 2010. The other file goes back to 2001, but contains some leakage.

Content

The data was acquired from Ken Pom's official website (leaky data) and from time machine services for the unleaky version.
h
nq_gar-t5_expansions
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castorini, nq_gar-t5_expansions [Dataset]. https://huggingface.co/datasets/castorini/nq_gar-t5_expansions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Castorini
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Summary

The repo provides answer, title and sentence expansions for the Natural Questions corpus with gar-T5.

Dataset Structure

There are dev and test folds An example data entry of the dev split looks as follows: { "id": "1", "predicted_answers": ["312"], "predicted_titles": ["Invisible Man"], "predicted_sentences": ["The Invisible Man First edition Author Ralph Ellison Cover artist M."] }

An example data entry of the test split looks as follows: {… See the full description on the dataset page: https://huggingface.co/datasets/castorini/nq_gar-t5_expansions.
March Madness Augmented Statistics
kaggle.com
Updated Apr 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colin Siles (2021). March Madness Augmented Statistics [Dataset]. https://www.kaggle.com/colinsiles/march-madness-augmented-statistics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 4, 2021
Dataset provided by
Kaggle
Authors
Colin Siles
Description
Context

A team's mean seasons statistics can be used as predictors for their performance in future games. However, these statistics gain additional meaning when placed in the context of their opponents' (and opponents' opponents') performance. This dataset provides this context for each team. Furthermore, predicting games based on post-season stats causes data leakage, which from experience can be significant in this context (15-20% loss in accuracy). Thus, this dataset provides each of these statistics prior to each game of the regular season, preventing any source of data leakage.

Content

All data is derived from the March Madness competition data. Each original column was renamed to "A" and "B" instead of "W" and "L," and the mirrored to represent both orderings of opponents. Each team's mean stats are computed (both their stats, and the mean "allowed" or "forced" statistics by their opponents). To compute the mean opponents' stats, we analyze the games played by each opponent (excluding games played against the team in question), and compute the mean statistics for those games. We then compute the mean of these mean statistics, weighted by the number of times the team in question played each opponent. The opponents' opponent's stats are computed as a weighted average of the opponents' average. This results in statistics similar to those used to compute strength of schedule or RPI, just that they go beyond win percentages (See: https://en.wikipedia.org/wiki/Rating_percentage_index)

The per game statistics are computed by pretending we don't have any of the data on or after the day in question.

Next Steps

Currently, the data isn't computed particularly efficiently. Computing the per game averages for every day of the season is necessary to compute fully accurate opponents' opponents' average, but takes about 90 minutes to obtain. It is probably possible to parallelize this, and the per-game averages involve a lot of repeated computation (basically computing the final averages over and over again for each day). Speeding this up will make it more convenient to make changes to the dataset.

I would like to transform these statistics to be per-possession, add shooting percentages, pace, and number of games played (to give an idea of the amount uncertainty that exists in the per-game averages). Some of these can be approximated with the given data (but the results won't be exact), while others will need to be computed from scratch.
NCAA March Madness 2020 Mens
kaggle.com
Updated Feb 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
corochann (2020). NCAA March Madness 2020 Mens [Dataset]. https://www.kaggle.com/corochann/ncaa-march-madness-2020-mens/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 29, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
corochann
Description
This is feather format data of the compeition Google Cloud & NCAA® ML Competition 2020-NCAAM. Please refer the kernel 2020 NCAAM: Fast data loading with feather for usage.

Cover photo from pexels.
2017 March ML Mania Predictions
kaggle.com
Updated Mar 16, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Cukierski (2017). 2017 March ML Mania Predictions [Dataset]. https://www.kaggle.com/datasets/wcukierski/2017-march-ml-mania-predictions/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Will Cukierski
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Kaggle’s March Machine Learning Mania competition challenged data scientists to predict winners and losers of the men's 2017 NCAA basketball tournament. This dataset contains the selected predictions of all Kaggle participants. These predictions were collected and locked in prior to the start of the tournament.

The NCAA tournament is a single-elimination tournament that begins with 68 teams. There are four games, usually called the “play-in round,” before the traditional bracket action starts. Due to competition timing, these games are included in the prediction files but should not be used in analysis, as it’s possible that the prediction was submitted after the play-in round games were over.

Data Description

Each Kaggle team could submit up to two prediction files. The prediction files in the dataset are in the 'predictions' folder. You can map the files to the teams by team_submission_key.csv.

The submission format contains a probability prediction for every possible game between the 68 teams. Refer to the competition documentation for data details. For convenience, we have included the data files from the competition dataset in the dataset (you may find TourneySlots.csv and TourneySeeds.csv useful for determining matchups). However, the focus of this dataset is on Kagglers' predictions.
World Soccer live data feed
kaggle.com
Updated Jan 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Ghahramani (2019). World Soccer live data feed [Dataset]. https://www.kaggle.com/datasets/analystmasters/world-soccer-live-data-feed/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohammad Ghahramani
Description
Context

This is the first live data stream on Kaggle providing a simple yet rich source of all soccer matches around the world 24/7 in real-time.

What makes it unique compared to other datasets?

It is the first live data feed on Kaggle and it is totally free

Unlike “Churn rate” datasets you do not have to wait months to evaluate your predictions; simply check the match’s outcome in a couple of hours

you can use your predictions/analysis for your own benefit instead of spending your time and resources on helping a company maximizing its profit

A Five year old laptop can do the calculations and you do not need high-end GPUs

Couldn’t make it to the top 3 submissions? Nevermind, you still have the chance to get your prize on your own

You can’t get accurate results on all samples? Do not worry, just filter out the hard ones (e.g. ignore international friendly) and simply choose the ones you are sure of.

Need help from human experts for each sample? Every sample comes with at least two opinions from experts

You wish you could add your complementary data? Just contact us and we will try to facilitate it.

Couldn’t win “Warren Buffett's 2018 March Madness Bracket Contest”? Here is your chance to make your accumulative profit.

Simply train your algorithm on the first version of training dataset of approximately 11.5k matches and predict the data provided in the following data feed.

Fetch the data stream

The CSV file is updated every 30 minutes at minutes 20’ and 50’ of every hour. I kindly request not to download it more than twice per hour as it incurs additional cost.

You may download the csv data file from the following link from Amazon S3 server by changing the FOLDER_NAME as below,

https://s3.amazonaws.com/FOLDER_NAME/amasters.csv

*. Substitute the FOLDER_NAME with "**analyst-masters**"

Content

Our goal is to identify the outcome of a match as Home, Draw or Away. The variety of sources and nature of information provided in this data stream makes it a unique database. Currently, FIVE servers are collecting data from soccer matches around the world, communicating with each other and finally aggregating the data based on the dominant features learned from 400,000 matches over 7 years. I describe every column and the data collection below in two categories, Category I – Current situation and Category II – Head-to-Head History. Hence, we divide the type of data we have from each team to 4 modes,

Mode 1: we have both Category I and Category II available

Mode 2: we only have Category I available

Mode 3: we only have Category II available

Mode 4: none of Category I and II are available

Below you can find a full illustration of each category.

I. Current situation

Col 1 to 3:

Votes_for_Home Votes_for_Draw Votes_for_Away

The most distinctive parts of the database are these 3 columns. We are releasing opinions of over 100 professional soccer analysts predicting the outcome of a match. Their votes is the result of every piece of information they receive on players, team line-up, injuries and the urge of a team to win a match to stay in the league. They are spread around the world in various time zones and are experts on soccer teams from various regions. Our servers aggregate their opinions to update the CSV file until kickoff. Therefore, even if 40 users predict Real-Madrid wins against Real-Sociedad in Santiago Bernabeu on January 6th, 2019 but 5 users predict Real-Sociedad (the away team) will be the winner, you should doubt the home win. Here, the “majority of votes” works in conjunction with other features.

Col 4 to 9:

Weekday Day Month Year Hour Minute

There are over 60,000 matches during a year, and approximately 400 ones are usually held per day on weekends. More critical and exciting matches, which are usually less predictable, are held toward the evening in Europe. We are currently providing time in Central Europe Time (CET) equivalent to GMT +01:00.

*. Please note that the 2nd row of the CSV file represents the time, data values are saved from all servers to the file.

Col 10 to 13:

Total_Bettors Bet_Perc_on_Home Bet_Perc_on_Draw Bet_Perc_on_Away

This data is recorded a few hours before the match as people place bets emotionally when kickoff approaches. The percentage of the overall number of people denoted as “Total_Bettors” is indicated in each column for “Home,” “Draw” and “Away” outcomes.

Col 14 to 15:

Team_1 Team_2

The team playing “Home” is “Team_1” and the opponent playing “Away” is “Team_2”.

Col 16 to 36:

League_Rank_1 League_Rank_2 Total_teams Points_1 Points_2 Max_points Min_points Won_1 Draw_1 Lost_1 Won_2 Draw_2 Lost_2 Goals_Scored_1 Goals_Scored_2 Goals_Rec_1 Goal_Rec_2 Goals_Diff_1 Goals_Diff_2

If the match is betw...
womens-leaderboard-analyzer-app-2021
kaggle.com
Updated Mar 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matt Motoki (2021). womens-leaderboard-analyzer-app-2021 [Dataset]. https://www.kaggle.com/mmotoki/womens-leaderboard-analyzer-app-2021
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 3, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Matt Motoki
Description
Women's March Madness Leaderboard Analyzer App

Summary

The app allows you to upload a submission and analyze how well you would have done in previous years’ competitions. * The Public leaderboard is usually full of leaky submissions making it hard to determine the quality of a submission. The Public leaderboard is included here for comparison. It is updated everytime the app is run. * The Average leaderboard shows the average score of the nth place teams. For example, if your submission places 10th on the Average leaderboard then your score is slightly better the average of the 10th place teams in the previous competitions and slightly worse than the average of the 9th place teams in the previous competitions. * The 2018 - 2019 leaderboards are exact copys from previous competitions.You can use them to view where your submission would have placed in those competitions.

Run on the App on Kaggle

Fork and edit the Women's March Madness 2021 Leaderboard Analyzer on Kaggle. Run all cells of the notebook and view the app in a separate tab using the url generated by ngrok.

Important: The app needs a backend to run. You must fork and edit the notebook. You won't be able to view the app from a static Kaggle notebook.

Local Installation

1. Run the Wave Server

Follow the instructions here to download and run the latest Wave Server, a requirement for apps. Note: If you have a version of Wave older than or equal to 0.12.0, you will need to reinstall Wave with a newer version.

2. Download the App

Download the app code from kaggle. Open a terminal in the downloaded womens_leaderboard directory and create a tmp folder for uploded files. bash $ mkdir tmp

3. Setup Your Python Environment

$ make setup $ source venv/bin/activate

4. Run the App

$ wave run leaderboard.app

5. View the App

Point your favorite web browser to localhost:10101
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Arnav Samal (2024). March Madness | Historical Data | 2012-2023 [Dataset]. https://www.kaggle.com/datasets/arnavs19/march-madness-historical-data-2012-2023

March Madness | Historical Data | 2012-2023

March Madness Mayhem: A Decade of Dunking, Upsets, and Championship Glory

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 27, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Arnav Samal

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This comprehensive CSV dataset compiles historical features of NCAA basketball teams participating in March Madness tournaments from 2012 to 2023. The dataset includes a rich array of performance metrics aimed at analyzing team dynamics and competitiveness. Key features encompass win-loss percentage, advanced metrics like Simple Rating System (SRS), Strength of Schedule (SOS), field goal percentage (FG%), three-point percentage (3P%), free throw percentage (FT%), home and away win rates, conference win rates, and point differential percentage.

Additionally, advanced statistical insights are provided, such as adjusted efficiency margin (AdjEM), adjusted offensive efficiency (AdjO), adjusted defensive efficiency (AdjD), adjusted tempo (AdjT), luck factor, adjusted strength of schedule (SOS AdjEM), average adjusted offensive efficiency of opposing teams (OppO), average adjusted defensive efficiency of opposing teams (OppD), and non-conference adjusted strength of schedule (NCSOS AdjEM). This dataset serves as a valuable resource for researchers, analysts, and enthusiasts seeking to delve into the intricate performance dynamics of collegiate basketball teams during the March Madness era.

Clear search

Close search

Google apps

Main menu

March Madness | Historical Data | 2012-2023

March Madness Historical DataSet (2002 to 2025)

March Madness Predictions

college basketball march madness data

NCAA Men's March Madness average TV viewership in the U.S. 2025

‘March Madness 2018’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

Supplemental March Madness Data

Dataset

Contents

2016 March ML Mania Predictions

Data Description

‘College Basketball Dataset’ analyzed by Analyst-2

Content

Variables

Acknowledgements

KenPom2022

Update

Context

Content

nq_gar-t5_expansions

March Madness Augmented Statistics

Context

Content

Next Steps

NCAA March Madness 2020 Mens

2017 March ML Mania Predictions

Data Description

World Soccer live data feed

Context

Fetch the data stream

Content

womens-leaderboard-analyzer-app-2021

Women's March Madness Leaderboard Analyzer App

Summary

Run on the App on Kaggle

Local Installation

1. Run the Wave Server

2. Download the App

3. Setup Your Python Environment

4. Run the App

5. View the App

March Madness | Historical Data | 2012-2023

March Madness Mayhem: A Decade of Dunking, Upsets, and Championship Glory