Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are some great UFC datasets available on Kaggle. I want to bring together all of those sets into one set to allow for deeper analysis.
Version 4 has data updated through June 22nd, 2020
Version 4 of this dataset includes: Rajeev Warrier's excellent dataset. This dataset was the basis for my work. It contains data for every UFC bout. The 'red fighter' and 'blue fighter' are improperly recorded prior to around 2010, so that data has been excluded. Additionally, features that could not be easily scraped by me for future fights have been removed.
My odds dataset. My big contribution was the gambling odds for each fight.
Mart Jürisoo's Rankings dataset. Includes a history of UFC fighter rankings. A wonderful resource that could have a lot of implications for machine learning models.
There are 108 columns of data. I have included a detailed description to the data file.
I have created some new features for this dataset. Highlights include a set of differential features [age_dif, avg_td_dif, reach_dif....] that are the blue fighter's feature minus the red fighter's feature. The feature 'empty_arena' denotes whether the fights occurred in an empty arena.
I plan on uploading a file of upcoming fights before every event and updating the main csv after every event.
Poke around my GitHub for this project. Sorry for the lack of documentation. I'll get around to it!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset got a lot of love from the community and I saw many people asking for an updated version, so I have uploaded the latest scraped and processed data ( as of 21/03/2021). Now it's super easy for anyone to get the latest dataset (Just use a single command), so in case you need bleeding-edge data, or you want to see the code, you can look here. Hope this solves all problems! If there are any issues with the data, please forgive me and write about it in the comments or raise an issue on github. I will pick it up 👍 Thank you everyone for the emails and messages. As usual, have fun! ❤️ 😁
This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner. The data was scraped from ufcstats website. After fightmetric ceased to exist, this came into picture. I saw that there was a lot of information on the website about every fight and every event and there were no existing ways of capturing all this. I used beautifulsoup to scrape the data and pandas to process it. It was a long and arduous process, please forgive any mistakes. I have provided the raw files incase anybody wants to process it differently. This is my first time creating a dataset, any suggestions and corrections are welcome! Incase anyone wants to check out the work, I have all uploaded all the code files, including the scraping module here
Have fun!
Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened. Here are some column definitions:
R_
and B_
prefix signifies red and blue corner fighter stats respectively_opp_
containing columns is the average of damage done by the opponent on the fighterKD
is number of knockdownsSIG_STR
is no. of significant strikes 'landed of attempted'SIG_STR_pct
is significant strikes percentageTOTAL_STR
is total strikes 'landed of attempted'TD
is no. of takedownsTD_pct
is takedown percentagesSUB_ATT
is no. of submission attemptsPASS
is no. times the guard was passed?REV
is the no. of Reversals landedHEAD
is no. of significant strinks to the head 'landed of attempted'BODY
is no. of significant strikes to the body 'landed of attempted'CLINCH
is no. of significant strikes in the clinch 'landed of attempted'GROUND
is no. of significant strikes on the ground 'landed of attempted'win_by
is method of winlast_round
is last round of the fight (ex. if it was a KO in 1st, then this will be 1)last_round_time
is when the fight ended in the last roundFormat
is the format of the fight (3 rounds, 5 rounds etc.)Referee
is the name of the Refdate
is the date of the fightlocation
is the location in which the event took placeFight_type
is which weight class and whether it's a title bout or notWinner
is the winner of the fightStance
is the stance of the fighter (orthodox, southpaw, etc.)Height_cms
is the height in centimeterReach_cms
is the reach of the fighter (arm span) in centimeterWeight_lbs
is the weight of the fighter in pounds (lbs)age
is the age of the fightertitle_bout
Boolean value of whether it is title fight or notweight_class
is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)no_of_rounds
is the number of rounds the fight was scheduled forcurrent_lose_streak
is the count of current concurrent losses of the fightercurrent_win_streak
is the count of current concurrent wins of the fighterdraw
is the number of draws in the fighter's ufc careerwins
is the number of wins in the fighter's ufc careerlosses
is the number of losses in the fighter's ufc careertotal_rounds_fought
is the average of total rounds fought by the fightertotal_time_fought(seconds)
is the count of total time spent fighting in secondstotal_title_bouts
is the total number of title bouts taken part in by the fighterwin_by_Decision_Majority
is the number of wins by majority judges decision in the fighter's ufc careerwin_by_Decision_Split
is the number of wins by split judges decision in the fighter's ufc careerwin_by_Decision_Unanimous
is the number of wins by unanimous judges decision in the fighter's ufc careerwin_by_KO/TKO
is the number of wins by knockout in the fighter's ufc careerwin_by_Submission
is the number of wins by submission in the fighter's ufc careerwin_by_TKO_Doctor_Stoppage
is the number of wins by doctor stoppage in the fighter's ufc careerInspiration: https://github.com/Hitkul/UFC_Fight_Prediction Provided ideas on how to store per fight data. Unfortunately, the entire UFC website and fightmetric website changed so couldn't reuse any of the code.
Print Progress Bar: https://gist.github.com/aubricus/f91fb55dc6ba5557fbab06119420dd6a To display progress of how much download is complete in the terminal
You can check out who I am and what I do here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are some great UFC datasets out there, but I could not find one that included gambling odds.... So I went and made one myself. This dataset focuses very generally on the fights and hopes to be able to draw very broad conclusions. More a more in depth statistical fight analysis I would recommend Rajeev Warrier's excellent datasetwhich was the inspiration for my work.
This dataset consists of 11 columns of data with basic information about every match that took place between March 21, 2010 and March 14, 2020.
R_fighter
and B_fighter
: The names of the fighter in the red corner and the fighter in the blue corner
R_odds
and B_odds
: The American odds of the fighter winning.
date
: The date of the fight
location
: The location of the fight
country
: The country the fight occurred in
Winner
: The winner of the fight ('Red' or 'Blue')
title_bout
: Was this fight a title bout? ('True' or 'False')
weight_class
: What weight class did this fight occur at?
gender
: Male or Female
I was inspired by the work of Rajeev Warrier
My work, including a scraper to help gather data for upcoming events, can be found on my GitHub. I promise I'll add more documentation soon.
This dataset was created by mdabbert
It contains the following files:
The project goal was to collect data on approximately 100 Unified Family Court (UFC) cases at each of the three selected jurisdictions -- Maricopa County, Arizona, Deschutes County, Oregon, and Jackson County, Oregon -- that have developed systems to address the special needs of families with multiple court cases. The purpose of the study was to examine research questions related to: (1) dependency case processing and outcomes, (2) delinquency case processing and outcomes, (3) domestic relations/probate case processing and outcomes, and (4) criminal case processing and outcomes. The data used in this study were generated from a review of the court records of 602 families including 406 families served by the UFC as well as comparison groups of 196 non-UFC multi-case families. During the study's planning phase, an instrument was drafted for use in extracting this information. Data collectors were recruited from former UFC staff and current and former non-UFC court staff. All data collectors were trained by the principal investigator in the use of the data collection form. The vast majority of all data extraction required a manual review of paper files. Variables in this dataset are organized into the following categories: background variables, items from dependency/abuse and neglect filings, delinquency filings, domestic relations/probate filings, civil domestic violence/protection order filings, criminal domestic violence filings, criminal child abuse filings, other criminal filings, and variables from a summary across cases.
This dataset was created by mdabbert
It contains the following files:
https://financialreports.eu/https://financialreports.eu/
Comprehensive collection of financial reports and documents for Unified Factory Spolka Akcyjna (UFC)
This dataset was created by Jeremy Sun zhaocheng
It contains the following files:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are some great UFC datasets available on Kaggle. I want to bring together all of those sets into one set to allow for deeper analysis.
Version 4 has data updated through June 22nd, 2020
Version 4 of this dataset includes: Rajeev Warrier's excellent dataset. This dataset was the basis for my work. It contains data for every UFC bout. The 'red fighter' and 'blue fighter' are improperly recorded prior to around 2010, so that data has been excluded. Additionally, features that could not be easily scraped by me for future fights have been removed.
My odds dataset. My big contribution was the gambling odds for each fight.
Mart Jürisoo's Rankings dataset. Includes a history of UFC fighter rankings. A wonderful resource that could have a lot of implications for machine learning models.
There are 108 columns of data. I have included a detailed description to the data file.
I have created some new features for this dataset. Highlights include a set of differential features [age_dif, avg_td_dif, reach_dif....] that are the blue fighter's feature minus the red fighter's feature. The feature 'empty_arena' denotes whether the fights occurred in an empty arena.
I plan on uploading a file of upcoming fights before every event and updating the main csv after every event.
Poke around my GitHub for this project. Sorry for the lack of documentation. I'll get around to it!