100+ datasets found

mlcourse.ai - Dota 2 - winner prediction Dataset
kaggle.com
zip
Updated Sep 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sushma Biswas (2019). mlcourse.ai - Dota 2 - winner prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sushmabiswas/mlcourseai-dota-2-winner-prediction-dataset
Explore at:
zip(759868828 bytes)Available download formats
Dataset updated
Sep 8, 2019
Authors
Sushma Biswas
Description
Context

Hello! I am currently taking the mlcourse.ai course and as part of one of it's in-class Kaggle competitions, this dataset was required. The data is originally hosted on git but I like to have my data right here on Kaggle. That's why this dataset.

If you find this dataset useful, do upvote. Thank you and happy learning!

Content

This dataset contains 6 files in total. 1. Sample_submission.csv 2. Train_features.csv 3. Test_features.csv 4. Train_targets.csv 5. Train_matches.jsonl 6. Test_matches.jsonl

Acknowledgements

All of the data in this dataset is originally hosted on git and the same can also be found on the in-class competition's 'data' page here.

Inspiration

to be updated.
A Visual History of Nobel Prize Winners Dataset
kaggle.com
zip
Updated Aug 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amit Hasan Shuvo (2020). A Visual History of Nobel Prize Winners Dataset [Dataset]. https://www.kaggle.com/amithasanshuvo/a-visual-history-of-nobel-prize-winners-dataset
Explore at:
zip(66754 bytes)Available download formats
Dataset updated
Aug 5, 2020
Authors
Amit Hasan Shuvo
Description
Content

The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?

Well, we're going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. Let's load it in and take a look

Reference:

This is a project of "Data Scientist with Python Track" of DataCamp
d
Data from: Machine learning driven self-discovery of the robot body...
search.dataone.org
data.niaid.nih.gov
+2more
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando Diaz Ledezma; Sami Haddadin (2024). Machine learning driven self-discovery of the robot body morphology [Dataset]. http://doi.org/10.5061/dryad.h44j0zpsf
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.h44j0zpsf
Dataset updated
Aug 15, 2024
Dataset provided by
Dryad Digital Repository
Authors
Fernando Diaz Ledezma; Sami Haddadin
Time period covered
Jan 1, 2023
Description
Conventionally, the kinematic structure of a robot is assumed to be known and data from external measuring devices are used mainly for calibration. We take an agent-centric perspective to explore whether a robot could learn its body structure by relying on scarce knowledge and depending only on unorganized proprioceptive signals. To achieve this, we analyze a mutual-information-based representation of the relationships between the proprioceptive signals, which we call proprioceptive information graphs (pi-graph), and use it to look for connections that reflect the underlying mechanical topology of the robot. We then use the inferred topology to guide the search for the morphology of the robot; i.e. the location and orientation of its joints. Results from different robots show that the correct topology and morphology can be effectively inferred from their pi-graph, regardless of the number of links and body configuration., The datasets contain the proprioceptive signals for a robot arm, a hexapod, and a humanoid, including joint position, velocity, torque, body angular and linear velocities, and body angular and linear accelerations. The robot manipulator experiment used simulated robot joint trajectories to generate the proprioceptive signals. These signals were computed using the robot's Denavit-Hartenberg parameters and the Newton-Euler method with artificially added noise. In the physical experiment, joint trajectories were optimized for joint velocity signal entropy, and measurements were obtained directly from encoders, torque sensors, and inertial measurement units (IMU). In the hexapod and humanoid robot experiments, sensor data was collected from a physics simulator (Gazebo 11) using virtual IMU sensors. Filters were applied to handle measurement noise, including low-pass filters for offline estimation and moving average filters for online estimation, emphasizing noise reduction for angular veloc..., , # Machine Learning Driven Self-Discovery of the Robot Body Morphology

The repository contains:

Data sets

Links to MATLAB source code

Requirements

MATLAB's Robotics System Toolbox

MATLAB's Optimization Toolbox

ToolÂboxes for optiÂmization on manifolds and matrices MANOPT

Java Information Dynamics Toolkit JIDT

NOTE: MATLAB 2021b was used.

Sharing/Access information

All datasets are also publicly available at Kaggle; these are the corresponding links:

Simulated robot manipulator with fixed and moving base here

Physical manipulator experiment (fixed base) here

Simulated hexapod robot here

Simulated humanoid robot [h...
Nobel Prize Winners
kaggle.com
Updated Aug 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Arvidsson (2023). Nobel Prize Winners [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/nobel-prize
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Joakim Arvidsson
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset list all Nobel laureates (persons and organizations) from 1902. One Nobel Laureate may be awarded more than one Nobel Prize.

The dataset is published by Nobel Media AB: https://www.nobelprize.org/about/
P
Kaggle EyePACS Dataset
paperswithcode.com
Updated Feb 17, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Kaggle EyePACS Dataset [Dataset]. https://paperswithcode.com/dataset/kaggle-eyepacs
Explore at:
Dataset updated
Feb 17, 2015
Description
Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people.

retina

The US Center for Disease Control and Prevention estimates that 29.1 million people in the US have diabetes and the World Health Organization estimates that 347 million people have the disease worldwide. Diabetic Retinopathy (DR) is an eye disease associated with long-standing diabetes. Around 40% to 45% of Americans with diabetes have some stage of the disease. Progression to vision impairment can be slowed or averted if DR is detected in time, however this can be difficult as the disease often shows few symptoms until it is too late to provide effective treatment.

Currently, detecting DR is a time-consuming and manual process that requires a trained clinician to examine and evaluate digital color fundus photographs of the retina. By the time human readers submit their reviews, often a day or two later, the delayed results lead to lost follow up, miscommunication, and delayed treatment.

Clinicians can identify DR by the presence of lesions associated with the vascular abnormalities caused by the disease. While this approach is effective, its resource demands are high. The expertise and equipment required are often lacking in areas where the rate of diabetes in local populations is high and DR detection is most needed. As the number of individuals with diabetes continues to grow, the infrastructure needed to prevent blindness due to DR will become even more insufficient.

The need for a comprehensive and automated method of DR screening has long been recognized, and previous efforts have made good progress using image classification, pattern recognition, and machine learning. With color fundus photography as input, the goal of this competition is to push an automated detection system to the limit of what is possible – ideally resulting in models with realistic clinical potential. The winning models will be open sourced to maximize the impact such a model can have on improving DR detection.

Acknowledgements This competition is sponsored by the California Healthcare Foundation.

Retinal images were provided by EyePACS, a free platform for retinopathy screening.
A
‘Premium Bonds - High Value Winners’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Premium Bonds - High Value Winners’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-premium-bonds-high-value-winners-bdbc/85ef4531/?iid=006-369&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Premium Bonds - High Value Winners’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/samuelcortinhas/premium-bond-winners-december-2021 on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset from https://www.nsandi.com/prize-checker/winners contains all the high value prize winners from this months premium bond draw. Premium bonds are a way to save money in the UK, where you invest an amount up to £50,000 and every month you get put into draw where you can win tax free prizes. Every pound invested counts as 1 raffle so the more money you have invested the more likely you are to win. Prizes can range from £25 to £1,000,000.

Content

This dataset contains all the prizes worth £1,000 or more that were awarded each month between Dec 2021 to present. It includes the prize values, the bond numbers, total value holdings, locations and dates of purchases for each winner. (Note: data format for Dec 2021 is slightly different from the rest, format for 2022 onwards will be uniform.)

Acknowledgements

Data was collected from https://www.nsandi.com/prize-checker/winners by downloading an excel file and converting it to a csv.

--- Original source retains full ownership of the source dataset ---
n
Data from: Assessing predictive performance of supervised machine learning...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.wh70rxwrh
Dataset updated
May 23, 2023
Dataset provided by
Strathmore University
Authors
Evans Omondi
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.
H
Replication Data for: Nobel Prize Winners: 1910 - 2023
dataverse.harvard.edu
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lance Drouet (2024). Replication Data for: Nobel Prize Winners: 1910 - 2023 [Dataset]. http://doi.org/10.7910/DVN/6JBXYL
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/6JBXYL
Dataset updated
Feb 7, 2024
Dataset provided by
Harvard Dataverse
Authors
Lance Drouet
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data Source: https://www.kaggle.com/datasets/sazidthe1/nobel-prize-data/data Data Description : year - The year in which the Nobel Prize was awarded category - The category or field for which the Nobel Prize was awarded motivation - A brief description of the laureate's contributions for which they received the prize prizeShare - Indicates whether the laureate shared the prize with others laureateID - A unique identifier for the laureate fullName - The full name of the laureate gender - The gender of the laureate born - The birthdate of the laureate bornCountry - The country in which the laureate was born bornCity - The city in which the laureate was born died - The date of death of the laureate diedCountry - The country in which the laureate died diedCity - The city in which the laureate died organizationName - The name of the organization which the laureate is associated with organizationCountry - The country in which the associated organization is located organizationCity - The city in which the associated organization is located
A
‘Women's International Football Results’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Women's International Football Results’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-women-s-international-football-results-bda3/531389dd/?iid=005-699&v=presentation
Explore at:
Dataset updated
Aug 20, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Women's International Football Results’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/martj42/womens-international-football-results on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This is a work-in-progress sister data set to the men's international football results dataset. If you're interested in helping out, submit a pull request here.

Content

Currently, the dataset includes 4,169 women's international football results. All major tournament results should be complete. Some international friendlies, particularly tournaments, are included. A LOT of results are not yet in the dataset.

results.csv includes the following columns:

date - date of the match

home_team - the name of the home team

away_team - the name of the away team

home_score - full-time home team score including extra time, not including penalty-shootouts

away_score - full-time away team score including extra time, not including penalty-shootouts

tournament - the name of the tournament

city - the name of the city/town/administrative unit where the match was played

country - the name of the country where the match was played

neutral - TRUE/FALSE column indicating whether the match was played at a neutral venue

Acknowledgements

The data is gathered from several sources including but not limited to Wikipedia, fifa.com, rsssf.com and individual football associations' websites.

Inspiration

Some directions to take when exploring the data:

Who is the best team of all time

Which teams dominated different eras of football

What trends have there been in international football throughout the ages - home advantage, total goals scored, distribution of teams' strength etc

Can we say anything about geopolitics from football fixtures - how has the number of countries changed, which teams like to play each other

Which countries host the most matches where they themselves are not participating in

How much, if at all, does hosting a major tournament help a country's chances in the tournament

Which teams are the most active in playing friendlies and friendly tournaments - does it help or hurt them

The world's your oyster, my friend.

Contribute

If you notice a mistake or the results are being updated fast enough for your liking, you can fix that by submitting a pull request on github.

✌🏼✌🏼✌🏼

--- Original source retains full ownership of the source dataset ---
A
‘Gender Classification Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Oct 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Gender Classification Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-gender-classification-dataset-837a/77df5245/?iid=004-594&v=presentation
Explore at:
Dataset updated
Oct 8, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Gender Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/elakiricoder/gender-classification-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

While I was practicing machine learning, I wanted to create a simple dataset that is closely aligned to the real world scenario and gives better results to whet my appetite on this domain. If you are a beginner who wants to try solving classification problems in machine learning and if you prefer achieving better results, try using this dataset in your projects which will be a great place to start.

Content

This dataset contains 7 features and a label column.

long_hair - This column contains 0's and 1's where 1 is "long hair" and 0 is "not long hair". forehead_width_cm - This column is in CM's. This is the width of the forehead. forehead_height_cm - This is the height of the forehead and it's in Cm's. nose_wide - This column contains 0's and 1's where 1 is "wide nose" and 0 is "not wide nose". nose_long - This column contains 0's and 1's where 1 is "Long nose" and 0 is "not long nose". lips_thin - This column contains 0's and 1's where 1 represents the "thin lips" while 0 is "Not thin lips". distance_nose_to_lip_long - This column contains 0's and 1's where 1 represents the "long distance between nose and lips" while 0 is "short distance between nose and lips".

gender - This is either "Male" or "Female".

Acknowledgements

Nothing to acknowledge as this is just a made up data.

Inspiration

It's painful to see bad results at the beginning. Don't begin with complicated datasets if you are a beginner. I'm sure that this dataset will encourage you to proceed further in the domain. Good luck.

--- Original source retains full ownership of the source dataset ---
Eurovision Contest Winners
kaggle.com
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Eurovision Contest Winners [Dataset]. https://www.kaggle.com/datasets/thedevastator/eurovision-contest-winners-a-look-at-the-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Description
Eurovision Contest Winners: A Look at the Data

World's Longest Running Tv Programme

About this dataset

The Eurovision Song Contest is an annual music competition that began in 1956. It is one of the longest-running television programmes in the world and is watched by millions of people every year. The contest's winner is determined using numerous voting techniques, including points awarded by juries or televoters.

Since 2004, the contest has included a televised semi-final::— In 2004 held on the Wednesday before the final:— Between 2005 and 2007 held on the Thursday of Eurovision Week n2 - Since 2008 the contest has included two semi-finals, held on the Tuesday and Thursday before the final.

The Eurovision Song Contest is a truly global event, with countries from all over Europe (and beyond) competing for the coveted prize. Over the years, some truly amazing performers have taken to the stage, entertaining audiences with their catchy songs and stunning stage performances.

So who will be crowned this year's winner? Tune in to find out!

How to use the dataset

This dataset contains information on all of the winners of the Eurovision Song Contest from 1956 to the present day. The data includes the year that the contest was held, the city that hosted it, the winning song and performer, the margin of points between the winning song and runner-up, and the runner-up country.

This dataset can be used to study patterns in Eurovision voting over time, or to compare different winning songs and performers. It could also be used to study how hosting the contest affects a country's chances of winning

Research Ideas

In order to studyEurovision Song Contest winners, one could use this dataset to train a machine learning model to predict the winner of the contest given a set of features about the song and the performers.

This dataset could be used to study how different voting methods (e.g. jury vs televoters) impact the outcome of the Eurovision Song Contest.

This dataset could be used to study trends in music over time by looking at how the style ofwinner songs has changed since the contest began in 1956

Acknowledgements

Data from eurovision_winners.csv was scraped from Wikipedia on April 4, 2020.

The dataset eurovision_winners.csv contains a list of all the winners of the Eurovision Song Contest from 1956 to the present day

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: eurovision_winners.csv | Column name | Description | |:--------------|:---------------------------------------------------------------------------------------------| | Year | The year in which the contest was held. (Integer) | | Date | The date on which the contest was held. (String) | | Host City | The city in which the contest was held. (String) | | Winner | The country that won the contest. (String) | | Song | The song that won the contest. (String) | | Performer | The performer of the winning song. (String) | | Points | The number of points that the winning song received. (Integer) | | Margin | The margin of victory (in points) between the winning song and the runner-up song. (Integer) | | Runner-up | The country that placed second in the contest. (String) |
A
‘Campaign Finance versus Election Results’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Campaign Finance versus Election Results’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-campaign-finance-versus-election-results-0951/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Campaign Finance versus Election Results’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danerbland/electionfinance on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset was assembled to investigate the possibility of predicting congressional election results by campaign finance reports from the period leading up to the election.

Content

Each row represents a candidate, with information on their campaign including the state, district, office, total contributions, total expenditures, etc. The content is specific to the year leading up to the 2016 election: (1/1/2015 through 10/19/2016).

Acknowledgements

Campaign finance information came directly from FEC.gov. Election results and vote totals for house races were taken from CNN's election results page.

Inspiration

How much of an impact does campaign spending and fundraising have on an election? Is the impact greater in certain areas? Given this dataset, to what degree of accuracy could we have predicted the election results?

--- Original source retains full ownership of the source dataset ---
A
‘FIFA - Football World Cup Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘FIFA - Football World Cup Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fifa-football-world-cup-dataset-2599/66e21fbf/?iid=018-912&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Analysis of ‘FIFA - Football World Cup Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamsouravbanerjee/fifa-football-world-cup-dataset on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

The FIFA World Cup, often simply called the World Cup, is an international association football competition contested by the senior men's national teams of the members of the Fédération Internationale de Football Association (FIFA), the sport's global governing body. The championship has been awarded every four years since the inaugural tournament in 1930, except in 1942 and 1946 when it was not held because of the Second World War. The current champion is France, which won its second title at the 2018 tournament in Russia.

The current format involves a qualification phase, which takes place over the preceding three years, to determine which teams qualify for the tournament phase. In the tournament phase, 32 teams, including the automatically qualifying host nation(s), compete for the title at venues within the host nation(s) over about a month.

The 21 World Cup tournaments have been won by eight national teams. Brazil have won five times, and they are the only team to have played in every tournament. The other World Cup winners are Germany and Italy, with four titles each; Argentina, France, and inaugural winner Uruguay, with two titles each; and England and Spain, with one title each.

The World Cup is the most prestigious association football tournament in the world, as well as the most widely viewed and followed single sporting event in the world. The cumulative viewership of all matches of the 2006 World Cup was estimated to be 26.29 billion with an estimated 715.1 million people watching the final match, a ninth of the entire population of the planet.

17 countries have hosted the World Cup. Brazil, France, Italy, Germany, and Mexico have each hosted twice, while Uruguay, Switzerland, Sweden, Chile, England, Argentina, Spain, the United States, Japan, and South Korea (jointly), South Africa, and Russia have each hosted once. Qatar will host the 2022 tournament, and 2026 will be jointly hosted by Canada, the United States, and Mexico, which will give Mexico the distinction of being the first country to host games in three World Cups.

Content

This Dataset consists of Records from all the previous Football World Cups (1930 to 2018)

Acknowledgements

For more, please visit - https://www.fifa.com/

--- Original source retains full ownership of the source dataset ---
The Oscar Award, 1927 - 2025
kaggle.com
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphael Fontes (2025). The Oscar Award, 1927 - 2025 [Dataset]. https://www.kaggle.com/unanimad/the-oscar-award/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raphael Fontes
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Please, If you enjoyed this dataset, don't forget to upvote it.

Context

The Academy Awards, also officially and popularly known as the Oscars, are awards for artistic and technical merit in the film industry. Given annually by the Academy of Motion Picture Arts and Sciences (AMPAS), the awards are an international recognition of excellence in cinematic achievements as assessed by the Academy's voting membership. The various category winners are awarded a copy of a golden statuette, officially called the "Academy Award of Merit", although more commonly referred to by its nickname "Oscar". The statuette depicts a knight rendered in Art Deco style.

Content

This file contains a scrape of The Academy Awards Database, recorded of past Academy Award winners and nominees between 1927 and 2025.

the_oscar_award.csv contains a view of the data consistent with past views of this Kaggle dataset.

full_data.csv contains the full data, with additional columns and parsing, imported from github

Acknowledgements

The awards data was scraped from the Official Academy Awards search site; nominees were listed with their name first and film following in some categories, such as Best Actor/Actress, and in the reverse for others.

Inspiration

Do the Academy Awards reflect the diversity of American films or are the #OscarsSoWhite?

Which actor/actress has received the most awards overall or in a single year?

Which film has received the most awards in a ceremony?

Which country received the most awards at a ceremony and overall?

Can you predict who will receive the awards next year?

Thank you @lopcio for the amazing helpful on fix this dataset missing values through the updates.
National Soils Database - Dataset - data.gov.ie
data.gov.ie
Updated Jul 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2021). National Soils Database - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/national-soils-database
Explore at:
Dataset updated
Jul 23, 2021
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Soil Database has produced a national database of soil geochemistry including point and spatial distribution maps of major nutrients, major elements, essential trace elements, trace elements of special interest and minor elements. In addition, this study has generated a National Soil Archive, comprising bulk soil samples and a nucleic acids archive each of which represent a valuable resource for future soils research in Ireland. The geographical coherence of the geochemical results was considered to be predominantly underpinned by underlying parent material and glacial geology. Other factors such as soil type, land use, anthropogenic effects and climatic effects were also evident. The coherence between elements, as displayed by multivariate analyses, was evident in this study. Examples included strong relationships between Co, Fe, As, Mn and Cu. This study applied large-scale microbiological analysis of soils for the first time in Ireland and in doing so also investigated microbial community structure in a range of soil types in order to determine the relationship between soil microbiology and chemistry. The results of the microbiological analyses were consistent with geochemical analyses and demonstrated that bacterial community populations appeared to be predominantly determined by soil parent material and soil type.
A
‘Mobile Games A/B Testing - Cookie Cats’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Mobile Games A/B Testing - Cookie Cats’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-mobile-games-a-b-testing-cookie-cats-c35e/latest
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Mobile Games A/B Testing - Cookie Cats’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mursideyarkin/mobile-games-ab-testing-cookie-cats on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset includes A/B test results of Cookie Cats to examine what happens when the first gate in the game was moved from level 30 to level 40. When a player installed the game, he or she was randomly assigned to either gate_30 or gate_40.

Content

The data we have is from 90,189 players that installed the game while the AB-test was running. The variables are:

userid: A unique number that identifies each player. version: Whether the player was put in the control group (gate_30 - a gate at level 30) or the group with the moved gate (gate_40 - a gate at level 40). sum_gamerounds: the number of game rounds played by the player during the first 14 days after install. retention_1: Did the player come back and play 1 day after installing? retention_7: Did the player come back and play 7 days after installing?

When a player installed the game, he or she was randomly assigned to either.

Acknowledgements

This dataset is taken from DataCamp Cookie Cat is a hugely popular mobile puzzle game developed by Tactile Entertainment

Thanks to them for this dataset! 😻

--- Original source retains full ownership of the source dataset ---
o
YouTube Content Classification Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). YouTube Content Classification Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/fef9b558-dda7-42c6-83e3-048d99e5135b
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube, Social Media and Networking
Description
This dataset provides YouTube video metadata, suitable for practising text classification using Natural Language Processing (NLP) techniques. It includes video IDs, titles, descriptions, and categories, making it a valuable resource for those looking to apply and refine their NLP skills. The dataset was generated by scraping YouTube, offering a real-world scenario for data cleaning and analysis, including challenges such as missing values and class imbalance.

Columns

Video ID: A unique identifier for each YouTube video. Note that this column contains some missing data.

title: The title of the YouTube video.

description: The textual description associated with the YouTube video.

category: The category under which the video was classified when scraped.

link: A direct URL to the YouTube video.

Distribution

The dataset is typically provided in a CSV file format. It contains approximately 3,400 video records, derived from an initial scrape of 3,600 videos. The dataset is known to be untidy, featuring missing values and imbalanced classes across its categories, presenting an opportunity for data cleaning and preprocessing exercises.

Usage

This dataset is ideally suited for: * Practising basic text classification using various NLP techniques. * Learning how to handle common data issues such as missing values and imbalanced classes. * Developing and applying data cleaning and preprocessing methods. * Experimenting with different machine learning algorithms for text analysis.

Coverage

The dataset has a global reach, as it comprises YouTube videos accessible worldwide. It was listed on 08/06/2025. The video categories included in the dataset were specifically queried across four main areas: Travel Vlogs, Food, Art and Music, and History. Users should be aware that the data includes missing values and exhibits class imbalance across these categories.

License

CCO

Who Can Use It

This dataset is intended for individuals and researchers, particularly those at an intermediate skill level, who wish to practise and improve their text classification and NLP capabilities. It is also highly beneficial for anyone looking to gain practical experience in data cleaning, handling missing data, and addressing class imbalance in real-world datasets.

Dataset Name Suggestions

YouTube Video Classification Data

NLP YouTube Metadata Dataset

YouTube Content Classification Dataset

Video Description Text Analysis Dataset

Attributes

Original Data Source: Youtube Videos Dataset (~3400 videos)
Boston Marathon Qualifiers Dataset
kaggle.com
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Rock (2025). Boston Marathon Qualifiers Dataset [Dataset]. https://www.kaggle.com/datasets/runningwithrock/boston-marathon-qualifiers-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Brian Rock
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Boston
Description
This dataset includes a collection of individual race results that can be used to predict the cut-off time for the Boston Marathon.

The Boston Marathon specifics qualifying times for athletes depending on their age and gender. They also set a maximum field size for the event and only accept about 20,000 runners based on these qualifying times.

If more runners qualify and apply, they apply a cut-off time to reduce the qualified applicant pool until they reach the desired field size. In 2024, the cut-off time was 5:29. There has been speculation that this year's race will see a similarly large cut off.
Results of European Parliament elections
kaggle.com
Updated Apr 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Parliament (2019). Results of European Parliament elections [Dataset]. https://www.kaggle.com/datasets/eu-parliament/results-of-european-parliament-elections
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 3, 2019
Dataset provided by
Kaggle
Authors
European Parliament
Description
Content

More details about each file are in the individual file descriptions.

Context

This is a dataset from European Parliament hosted by the EU Open Data Portal. The Open Data Portal is found here and they update their information according the amount of data that is brought in. Explore European Parliament data using Kaggle and all of the data sources available through the European Parliament organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using the EU ODP API and Kaggle's API.

This dataset is distributed under the following licenses: Dataset License

Cover photo by Tamara Menzi on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Tennis winning shot classification
kaggle.com
zip
Updated Sep 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
greg (2020). Tennis winning shot classification [Dataset]. https://www.kaggle.com/datasets/greg1982/tennis-winning-shot-classification
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 18, 2020
Authors
greg
Description
We considered the final of the 2017 Wimbledon grand slam tournament between Roger Federer and Marin CiliC. The game was broadcasted by BBC and the video is available on-line on YouTube.

For each class, “winners” and “no-winners” sets of 100 example sequence is available.

G. Tsagkatakis, M. Jaber, and P. Tsakalides, "Convolutional neural networks for the analysis of broadcasted tennis games," in Proc. Visual Information Processing and Communication Conference, International Symposium on Electronic Imaging Burlingame, CA, USA, January 28-February 1, 2018.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sushma Biswas (2019). mlcourse.ai - Dota 2 - winner prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sushmabiswas/mlcourseai-dota-2-winner-prediction-dataset

mlcourse.ai - Dota 2 - winner prediction Dataset

Explore at:

zip(759868828 bytes)Available download formats

Dataset updated

Sep 8, 2019

Authors

Sushma Biswas

Description

Context

Hello! I am currently taking the mlcourse.ai course and as part of one of it's in-class Kaggle competitions, this dataset was required. The data is originally hosted on git but I like to have my data right here on Kaggle. That's why this dataset.

If you find this dataset useful, do upvote. Thank you and happy learning!

Content

This dataset contains 6 files in total. 1. Sample_submission.csv 2. Train_features.csv 3. Test_features.csv 4. Train_targets.csv 5. Train_matches.jsonl 6. Test_matches.jsonl

Acknowledgements

All of the data in this dataset is originally hosted on git and the same can also be found on the in-class competition's 'data' page here.

Inspiration

to be updated.

Clear search

Close search

Google apps

Main menu

mlcourse.ai - Dota 2 - winner prediction Dataset

Context

Content

Acknowledgements

Inspiration

A Visual History of Nobel Prize Winners Dataset

Content

Reference:

Data from: Machine learning driven self-discovery of the robot body...

Requirements

Sharing/Access information

Nobel Prize Winners

Kaggle EyePACS Dataset

‘Premium Bonds - High Value Winners’ analyzed by Analyst-2

Context

Content

Acknowledgements

Data from: Assessing predictive performance of supervised machine learning...

Replication Data for: Nobel Prize Winners: 1910 - 2023

‘Women's International Football Results’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Contribute

‘Gender Classification Dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Eurovision Contest Winners

Eurovision Contest Winners: A Look at the Data

World's Longest Running Tv Programme

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

‘Campaign Finance versus Election Results’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

‘FIFA - Football World Cup Dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

The Oscar Award, 1927 - 2025

Context

Content

Acknowledgements

Inspiration

National Soils Database - Dataset - data.gov.ie

‘Mobile Games A/B Testing - Cookie Cats’ analyzed by Analyst-2

Context

Content

Acknowledgements

YouTube Content Classification Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Boston Marathon Qualifiers Dataset

Results of European Parliament elections

Content

Context

Acknowledgements

Tennis winning shot classification

mlcourse.ai - Dota 2 - winner prediction Dataset

Context

Content

Acknowledgements

Inspiration