100+ datasets found
  1. mlcourse.ai - Dota 2 - winner prediction Dataset

    • kaggle.com
    zip
    Updated Sep 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sushma Biswas (2019). mlcourse.ai - Dota 2 - winner prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sushmabiswas/mlcourseai-dota-2-winner-prediction-dataset
    Explore at:
    zip(759868828 bytes)Available download formats
    Dataset updated
    Sep 8, 2019
    Authors
    Sushma Biswas
    Description

    Context

    Hello! I am currently taking the mlcourse.ai course and as part of one of it's in-class Kaggle competitions, this dataset was required. The data is originally hosted on git but I like to have my data right here on Kaggle. That's why this dataset.

    If you find this dataset useful, do upvote. Thank you and happy learning!

    Content

    This dataset contains 6 files in total. 1. Sample_submission.csv 2. Train_features.csv 3. Test_features.csv 4. Train_targets.csv 5. Train_matches.jsonl 6. Test_matches.jsonl

    Acknowledgements

    All of the data in this dataset is originally hosted on git and the same can also be found on the in-class competition's 'data' page here.

    Inspiration

    • to be updated.
  2. A Visual History of Nobel Prize Winners Dataset

    • kaggle.com
    zip
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amit Hasan Shuvo (2020). A Visual History of Nobel Prize Winners Dataset [Dataset]. https://www.kaggle.com/amithasanshuvo/a-visual-history-of-nobel-prize-winners-dataset
    Explore at:
    zip(66754 bytes)Available download formats
    Dataset updated
    Aug 5, 2020
    Authors
    Amit Hasan Shuvo
    Description

    Content

    The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?

    Well, we're going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. Let's load it in and take a look

    Reference:

    This is a project of "Data Scientist with Python Track" of DataCamp

  3. d

    Data from: Machine learning driven self-discovery of the robot body...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fernando Diaz Ledezma; Sami Haddadin (2024). Machine learning driven self-discovery of the robot body morphology [Dataset]. http://doi.org/10.5061/dryad.h44j0zpsf
    Explore at:
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Fernando Diaz Ledezma; Sami Haddadin
    Time period covered
    Jan 1, 2023
    Description

    Conventionally, the kinematic structure of a robot is assumed to be known and data from external measuring devices are used mainly for calibration. We take an agent-centric perspective to explore whether a robot could learn its body structure by relying on scarce knowledge and depending only on unorganized proprioceptive signals. To achieve this, we analyze a mutual-information-based representation of the relationships between the proprioceptive signals, which we call proprioceptive information graphs (pi-graph), and use it to look for connections that reflect the underlying mechanical topology of the robot. We then use the inferred topology to guide the search for the morphology of the robot; i.e. the location and orientation of its joints. Results from different robots show that the correct topology and morphology can be effectively inferred from their pi-graph, regardless of the number of links and body configuration., The datasets contain the proprioceptive signals for a robot arm, a hexapod, and a humanoid, including joint position, velocity, torque, body angular and linear velocities, and body angular and linear accelerations. The robot manipulator experiment used simulated robot joint trajectories to generate the proprioceptive signals. These signals were computed using the robot's Denavit-Hartenberg parameters and the Newton-Euler method with artificially added noise. In the physical experiment, joint trajectories were optimized for joint velocity signal entropy, and measurements were obtained directly from encoders, torque sensors, and inertial measurement units (IMU). In the hexapod and humanoid robot experiments, sensor data was collected from a physics simulator (Gazebo 11) using virtual IMU sensors. Filters were applied to handle measurement noise, including low-pass filters for offline estimation and moving average filters for online estimation, emphasizing noise reduction for angular veloc..., , # Machine Learning Driven Self-Discovery of the Robot Body Morphology

    The repository contains:

    • Data sets
    • Links to MATLAB source code

    Requirements

    • MATLAB's Robotics System Toolbox
    • MATLAB's Optimization Toolbox
    • ToolÂboxes for optiÂmization on manifolds and matrices MANOPT
    • Java Information Dynamics Toolkit JIDT

    NOTE: MATLAB 2021b was used.

    Sharing/Access information

    All datasets are also publicly available at Kaggle; these are the corresponding links:

    • Simulated robot manipulator with fixed and moving base here
    • Physical manipulator experiment (fixed base) here
    • Simulated hexapod robot here
    • Simulated humanoid robot [h...
  4. Nobel Prize Winners

    • kaggle.com
    Updated Aug 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2023). Nobel Prize Winners [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/nobel-prize
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Joakim Arvidsson
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset list all Nobel laureates (persons and organizations) from 1902. One Nobel Laureate may be awarded more than one Nobel Prize.

    The dataset is published by Nobel Media AB: https://www.nobelprize.org/about/

  5. P

    Kaggle EyePACS Dataset

    • paperswithcode.com
    Updated Feb 17, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Kaggle EyePACS Dataset [Dataset]. https://paperswithcode.com/dataset/kaggle-eyepacs
    Explore at:
    Dataset updated
    Feb 17, 2015
    Description

    Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people.

    retina

    The US Center for Disease Control and Prevention estimates that 29.1 million people in the US have diabetes and the World Health Organization estimates that 347 million people have the disease worldwide. Diabetic Retinopathy (DR) is an eye disease associated with long-standing diabetes. Around 40% to 45% of Americans with diabetes have some stage of the disease. Progression to vision impairment can be slowed or averted if DR is detected in time, however this can be difficult as the disease often shows few symptoms until it is too late to provide effective treatment.

    Currently, detecting DR is a time-consuming and manual process that requires a trained clinician to examine and evaluate digital color fundus photographs of the retina. By the time human readers submit their reviews, often a day or two later, the delayed results lead to lost follow up, miscommunication, and delayed treatment.

    Clinicians can identify DR by the presence of lesions associated with the vascular abnormalities caused by the disease. While this approach is effective, its resource demands are high. The expertise and equipment required are often lacking in areas where the rate of diabetes in local populations is high and DR detection is most needed. As the number of individuals with diabetes continues to grow, the infrastructure needed to prevent blindness due to DR will become even more insufficient.

    The need for a comprehensive and automated method of DR screening has long been recognized, and previous efforts have made good progress using image classification, pattern recognition, and machine learning. With color fundus photography as input, the goal of this competition is to push an automated detection system to the limit of what is possible – ideally resulting in models with realistic clinical potential. The winning models will be open sourced to maximize the impact such a model can have on improving DR detection.

    Acknowledgements This competition is sponsored by the California Healthcare Foundation.

    Retinal images were provided by EyePACS, a free platform for retinopathy screening.

  6. A

    ‘Premium Bonds - High Value Winners’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Premium Bonds - High Value Winners’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-premium-bonds-high-value-winners-bdbc/85ef4531/?iid=006-369&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Premium Bonds - High Value Winners’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/samuelcortinhas/premium-bond-winners-december-2021 on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset from https://www.nsandi.com/prize-checker/winners contains all the high value prize winners from this months premium bond draw. Premium bonds are a way to save money in the UK, where you invest an amount up to £50,000 and every month you get put into draw where you can win tax free prizes. Every pound invested counts as 1 raffle so the more money you have invested the more likely you are to win. Prizes can range from £25 to £1,000,000.

    Content

    This dataset contains all the prizes worth £1,000 or more that were awarded each month between Dec 2021 to present. It includes the prize values, the bond numbers, total value holdings, locations and dates of purchases for each winner. (Note: data format for Dec 2021 is slightly different from the rest, format for 2022 onwards will be uniform.)

    Acknowledgements

    Data was collected from https://www.nsandi.com/prize-checker/winners by downloading an excel file and converting it to a csv.

    --- Original source retains full ownership of the source dataset ---

  7. n

    Data from: Assessing predictive performance of supervised machine learning...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans Omondi (2023). Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]. http://doi.org/10.5061/dryad.wh70rxwrh
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Strathmore University
    Authors
    Evans Omondi
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The diamond is 58 times harder than any other mineral in the world, and its elegance as a jewel has long been appreciated. Forecasting diamond prices is challenging due to nonlinearity in important features such as carat, cut, clarity, table, and depth. Against this backdrop, the study conducted a comparative analysis of the performance of multiple supervised machine learning models (regressors and classifiers) in predicting diamond prices. Eight supervised machine learning algorithms were evaluated in this work including Multiple Linear Regression, Linear Discriminant Analysis, eXtreme Gradient Boosting, Random Forest, k-Nearest Neighbors, Support Vector Machines, Boosted Regression and Classification Trees, and Multi-Layer Perceptron. The analysis is based on data preprocessing, exploratory data analysis (EDA), training the aforementioned models, assessing their accuracy, and interpreting their results. Based on the performance metrics values and analysis, it was discovered that eXtreme Gradient Boosting was the most optimal algorithm in both classification and regression, with a R2 score of 97.45% and an Accuracy value of 74.28%. As a result, eXtreme Gradient Boosting was recommended as the optimal regressor and classifier for forecasting the price of a diamond specimen. Methods Kaggle, a data repository with thousands of datasets, was used in the investigation. It is an online community for machine learning practitioners and data scientists, as well as a robust, well-researched, and sufficient resource for analyzing various data sources. On Kaggle, users can search for and publish various datasets. In a web-based data-science environment, they can study datasets and construct models.

  8. H

    Replication Data for: Nobel Prize Winners: 1910 - 2023

    • dataverse.harvard.edu
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lance Drouet (2024). Replication Data for: Nobel Prize Winners: 1910 - 2023 [Dataset]. http://doi.org/10.7910/DVN/6JBXYL
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Lance Drouet
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data Source: https://www.kaggle.com/datasets/sazidthe1/nobel-prize-data/data Data Description : year - The year in which the Nobel Prize was awarded category - The category or field for which the Nobel Prize was awarded motivation - A brief description of the laureate's contributions for which they received the prize prizeShare - Indicates whether the laureate shared the prize with others laureateID - A unique identifier for the laureate fullName - The full name of the laureate gender - The gender of the laureate born - The birthdate of the laureate bornCountry - The country in which the laureate was born bornCity - The city in which the laureate was born died - The date of death of the laureate diedCountry - The country in which the laureate died diedCity - The city in which the laureate died organizationName - The name of the organization which the laureate is associated with organizationCountry - The country in which the associated organization is located organizationCity - The city in which the associated organization is located

  9. A

    ‘Women's International Football Results’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Women's International Football Results’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-women-s-international-football-results-bda3/531389dd/?iid=005-699&v=presentation
    Explore at:
    Dataset updated
    Aug 20, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Women's International Football Results’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/martj42/womens-international-football-results on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This is a work-in-progress sister data set to the men's international football results dataset. If you're interested in helping out, submit a pull request here.

    Content

    Currently, the dataset includes 4,169 women's international football results. All major tournament results should be complete. Some international friendlies, particularly tournaments, are included. A LOT of results are not yet in the dataset.

    results.csv includes the following columns:

    • date - date of the match
    • home_team - the name of the home team
    • away_team - the name of the away team
    • home_score - full-time home team score including extra time, not including penalty-shootouts
    • away_score - full-time away team score including extra time, not including penalty-shootouts
    • tournament - the name of the tournament
    • city - the name of the city/town/administrative unit where the match was played
    • country - the name of the country where the match was played
    • neutral - TRUE/FALSE column indicating whether the match was played at a neutral venue

    Acknowledgements

    The data is gathered from several sources including but not limited to Wikipedia, fifa.com, rsssf.com and individual football associations' websites.

    Inspiration

    Some directions to take when exploring the data:

    • Who is the best team of all time
    • Which teams dominated different eras of football
    • What trends have there been in international football throughout the ages - home advantage, total goals scored, distribution of teams' strength etc
    • Can we say anything about geopolitics from football fixtures - how has the number of countries changed, which teams like to play each other
    • Which countries host the most matches where they themselves are not participating in
    • How much, if at all, does hosting a major tournament help a country's chances in the tournament
    • Which teams are the most active in playing friendlies and friendly tournaments - does it help or hurt them

    The world's your oyster, my friend.

    Contribute

    If you notice a mistake or the results are being updated fast enough for your liking, you can fix that by submitting a pull request on github.

    ✌🏼✌🏼✌🏼

    --- Original source retains full ownership of the source dataset ---

  10. A

    ‘Gender Classification Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Oct 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Gender Classification Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-gender-classification-dataset-837a/77df5245/?iid=004-594&v=presentation
    Explore at:
    Dataset updated
    Oct 8, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Gender Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/elakiricoder/gender-classification-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    While I was practicing machine learning, I wanted to create a simple dataset that is closely aligned to the real world scenario and gives better results to whet my appetite on this domain. If you are a beginner who wants to try solving classification problems in machine learning and if you prefer achieving better results, try using this dataset in your projects which will be a great place to start.

    Content

    This dataset contains 7 features and a label column.

    long_hair - This column contains 0's and 1's where 1 is "long hair" and 0 is "not long hair". forehead_width_cm - This column is in CM's. This is the width of the forehead. forehead_height_cm - This is the height of the forehead and it's in Cm's. nose_wide - This column contains 0's and 1's where 1 is "wide nose" and 0 is "not wide nose". nose_long - This column contains 0's and 1's where 1 is "Long nose" and 0 is "not long nose". lips_thin - This column contains 0's and 1's where 1 represents the "thin lips" while 0 is "Not thin lips". distance_nose_to_lip_long - This column contains 0's and 1's where 1 represents the "long distance between nose and lips" while 0 is "short distance between nose and lips".

    gender - This is either "Male" or "Female".

    Acknowledgements

    Nothing to acknowledge as this is just a made up data.

    Inspiration

    It's painful to see bad results at the beginning. Don't begin with complicated datasets if you are a beginner. I'm sure that this dataset will encourage you to proceed further in the domain. Good luck.

    --- Original source retains full ownership of the source dataset ---

  11. Eurovision Contest Winners

    • kaggle.com
    Updated Nov 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Eurovision Contest Winners [Dataset]. https://www.kaggle.com/datasets/thedevastator/eurovision-contest-winners-a-look-at-the-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Eurovision Contest Winners: A Look at the Data

    World's Longest Running Tv Programme

    About this dataset

    The Eurovision Song Contest is an annual music competition that began in 1956. It is one of the longest-running television programmes in the world and is watched by millions of people every year. The contest's winner is determined using numerous voting techniques, including points awarded by juries or televoters.

    Since 2004, the contest has included a televised semi-final::— In 2004 held on the Wednesday before the final:— Between 2005 and 2007 held on the Thursday of Eurovision Week n2 - Since 2008 the contest has included two semi-finals, held on the Tuesday and Thursday before the final.

    The Eurovision Song Contest is a truly global event, with countries from all over Europe (and beyond) competing for the coveted prize. Over the years, some truly amazing performers have taken to the stage, entertaining audiences with their catchy songs and stunning stage performances.

    So who will be crowned this year's winner? Tune in to find out!

    How to use the dataset

    This dataset contains information on all of the winners of the Eurovision Song Contest from 1956 to the present day. The data includes the year that the contest was held, the city that hosted it, the winning song and performer, the margin of points between the winning song and runner-up, and the runner-up country.

    This dataset can be used to study patterns in Eurovision voting over time, or to compare different winning songs and performers. It could also be used to study how hosting the contest affects a country's chances of winning

    Research Ideas

    • In order to studyEurovision Song Contest winners, one could use this dataset to train a machine learning model to predict the winner of the contest given a set of features about the song and the performers.
    • This dataset could be used to study how different voting methods (e.g. jury vs televoters) impact the outcome of the Eurovision Song Contest.
    • This dataset could be used to study trends in music over time by looking at how the style ofwinner songs has changed since the contest began in 1956

    Acknowledgements

    Data from eurovision_winners.csv was scraped from Wikipedia on April 4, 2020.

    The dataset eurovision_winners.csv contains a list of all the winners of the Eurovision Song Contest from 1956 to the present day

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

    Columns

    File: eurovision_winners.csv | Column name | Description | |:--------------|:---------------------------------------------------------------------------------------------| | Year | The year in which the contest was held. (Integer) | | Date | The date on which the contest was held. (String) | | Host City | The city in which the contest was held. (String) | | Winner | The country that won the contest. (String) | | Song | The song that won the contest. (String) | | Performer | The performer of the winning song. (String) | | Points | The number of points that the winning song received. (Integer) | | Margin | The margin of victory (in points) between the winning song and the runner-up song. (Integer) | | Runner-up | The country that placed second in the contest. (String) |

  12. A

    ‘Campaign Finance versus Election Results’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Campaign Finance versus Election Results’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-campaign-finance-versus-election-results-0951/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Campaign Finance versus Election Results’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/danerbland/electionfinance on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset was assembled to investigate the possibility of predicting congressional election results by campaign finance reports from the period leading up to the election.

    Content

    Each row represents a candidate, with information on their campaign including the state, district, office, total contributions, total expenditures, etc. The content is specific to the year leading up to the 2016 election: (1/1/2015 through 10/19/2016).

    Acknowledgements

    Campaign finance information came directly from FEC.gov. Election results and vote totals for house races were taken from CNN's election results page.

    Inspiration

    How much of an impact does campaign spending and fundraising have on an election? Is the impact greater in certain areas? Given this dataset, to what degree of accuracy could we have predicted the election results?

    --- Original source retains full ownership of the source dataset ---

  13. A

    ‘FIFA - Football World Cup Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘FIFA - Football World Cup Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fifa-football-world-cup-dataset-2599/66e21fbf/?iid=018-912&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Analysis of ‘FIFA - Football World Cup Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamsouravbanerjee/fifa-football-world-cup-dataset on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The FIFA World Cup, often simply called the World Cup, is an international association football competition contested by the senior men's national teams of the members of the Fédération Internationale de Football Association (FIFA), the sport's global governing body. The championship has been awarded every four years since the inaugural tournament in 1930, except in 1942 and 1946 when it was not held because of the Second World War. The current champion is France, which won its second title at the 2018 tournament in Russia.

    The current format involves a qualification phase, which takes place over the preceding three years, to determine which teams qualify for the tournament phase. In the tournament phase, 32 teams, including the automatically qualifying host nation(s), compete for the title at venues within the host nation(s) over about a month.

    The 21 World Cup tournaments have been won by eight national teams. Brazil have won five times, and they are the only team to have played in every tournament. The other World Cup winners are Germany and Italy, with four titles each; Argentina, France, and inaugural winner Uruguay, with two titles each; and England and Spain, with one title each.

    The World Cup is the most prestigious association football tournament in the world, as well as the most widely viewed and followed single sporting event in the world. The cumulative viewership of all matches of the 2006 World Cup was estimated to be 26.29 billion with an estimated 715.1 million people watching the final match, a ninth of the entire population of the planet.

    17 countries have hosted the World Cup. Brazil, France, Italy, Germany, and Mexico have each hosted twice, while Uruguay, Switzerland, Sweden, Chile, England, Argentina, Spain, the United States, Japan, and South Korea (jointly), South Africa, and Russia have each hosted once. Qatar will host the 2022 tournament, and 2026 will be jointly hosted by Canada, the United States, and Mexico, which will give Mexico the distinction of being the first country to host games in three World Cups.

    Content

    This Dataset consists of Records from all the previous Football World Cups (1930 to 2018)

    Acknowledgements

    For more, please visit - https://www.fifa.com/

    --- Original source retains full ownership of the source dataset ---

  14. The Oscar Award, 1927 - 2025

    • kaggle.com
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raphael Fontes (2025). The Oscar Award, 1927 - 2025 [Dataset]. https://www.kaggle.com/unanimad/the-oscar-award/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Raphael Fontes
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    forthebadge made-with-python ForTheBadge built-with-love

    Please, If you enjoyed this dataset, don't forget to upvote it.

    Context

    The Academy Awards, also officially and popularly known as the Oscars, are awards for artistic and technical merit in the film industry. Given annually by the Academy of Motion Picture Arts and Sciences (AMPAS), the awards are an international recognition of excellence in cinematic achievements as assessed by the Academy's voting membership. The various category winners are awarded a copy of a golden statuette, officially called the "Academy Award of Merit", although more commonly referred to by its nickname "Oscar". The statuette depicts a knight rendered in Art Deco style.

    Content

    This file contains a scrape of The Academy Awards Database, recorded of past Academy Award winners and nominees between 1927 and 2025.

    the_oscar_award.csv contains a view of the data consistent with past views of this Kaggle dataset.

    full_data.csv contains the full data, with additional columns and parsing, imported from github

    Acknowledgements

    The awards data was scraped from the Official Academy Awards search site; nominees were listed with their name first and film following in some categories, such as Best Actor/Actress, and in the reverse for others.

    Inspiration

    1. Do the Academy Awards reflect the diversity of American films or are the #OscarsSoWhite?
    2. Which actor/actress has received the most awards overall or in a single year?
    3. Which film has received the most awards in a ceremony?
    4. Which country received the most awards at a ceremony and overall?
    5. Can you predict who will receive the awards next year?

    Thank you @lopcio for the amazing helpful on fix this dataset missing values through the updates.

  15. National Soils Database - Dataset - data.gov.ie

    • data.gov.ie
    Updated Jul 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.ie (2021). National Soils Database - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/national-soils-database
    Explore at:
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    data.gov.ie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Soil Database has produced a national database of soil geochemistry including point and spatial distribution maps of major nutrients, major elements, essential trace elements, trace elements of special interest and minor elements. In addition, this study has generated a National Soil Archive, comprising bulk soil samples and a nucleic acids archive each of which represent a valuable resource for future soils research in Ireland. The geographical coherence of the geochemical results was considered to be predominantly underpinned by underlying parent material and glacial geology. Other factors such as soil type, land use, anthropogenic effects and climatic effects were also evident. The coherence between elements, as displayed by multivariate analyses, was evident in this study. Examples included strong relationships between Co, Fe, As, Mn and Cu. This study applied large-scale microbiological analysis of soils for the first time in Ireland and in doing so also investigated microbial community structure in a range of soil types in order to determine the relationship between soil microbiology and chemistry. The results of the microbiological analyses were consistent with geochemical analyses and demonstrated that bacterial community populations appeared to be predominantly determined by soil parent material and soil type.

  16. A

    ‘Mobile Games A/B Testing - Cookie Cats’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Mobile Games A/B Testing - Cookie Cats’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-mobile-games-a-b-testing-cookie-cats-c35e/latest
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Mobile Games A/B Testing - Cookie Cats’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mursideyarkin/mobile-games-ab-testing-cookie-cats on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset includes A/B test results of Cookie Cats to examine what happens when the first gate in the game was moved from level 30 to level 40. When a player installed the game, he or she was randomly assigned to either gate_30 or gate_40.

    Content

    The data we have is from 90,189 players that installed the game while the AB-test was running. The variables are:

    userid: A unique number that identifies each player. version: Whether the player was put in the control group (gate_30 - a gate at level 30) or the group with the moved gate (gate_40 - a gate at level 40). sum_gamerounds: the number of game rounds played by the player during the first 14 days after install. retention_1: Did the player come back and play 1 day after installing? retention_7: Did the player come back and play 7 days after installing?

    When a player installed the game, he or she was randomly assigned to either.

    Acknowledgements

    This dataset is taken from DataCamp Cookie Cat is a hugely popular mobile puzzle game developed by Tactile Entertainment

    Thanks to them for this dataset! 😻

    --- Original source retains full ownership of the source dataset ---

  17. o

    YouTube Content Classification Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). YouTube Content Classification Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/fef9b558-dda7-42c6-83e3-048d99e5135b
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    YouTube, Social Media and Networking
    Description

    This dataset provides YouTube video metadata, suitable for practising text classification using Natural Language Processing (NLP) techniques. It includes video IDs, titles, descriptions, and categories, making it a valuable resource for those looking to apply and refine their NLP skills. The dataset was generated by scraping YouTube, offering a real-world scenario for data cleaning and analysis, including challenges such as missing values and class imbalance.

    Columns

    • Video ID: A unique identifier for each YouTube video. Note that this column contains some missing data.
    • title: The title of the YouTube video.
    • description: The textual description associated with the YouTube video.
    • category: The category under which the video was classified when scraped.
    • link: A direct URL to the YouTube video.

    Distribution

    The dataset is typically provided in a CSV file format. It contains approximately 3,400 video records, derived from an initial scrape of 3,600 videos. The dataset is known to be untidy, featuring missing values and imbalanced classes across its categories, presenting an opportunity for data cleaning and preprocessing exercises.

    Usage

    This dataset is ideally suited for: * Practising basic text classification using various NLP techniques. * Learning how to handle common data issues such as missing values and imbalanced classes. * Developing and applying data cleaning and preprocessing methods. * Experimenting with different machine learning algorithms for text analysis.

    Coverage

    The dataset has a global reach, as it comprises YouTube videos accessible worldwide. It was listed on 08/06/2025. The video categories included in the dataset were specifically queried across four main areas: Travel Vlogs, Food, Art and Music, and History. Users should be aware that the data includes missing values and exhibits class imbalance across these categories.

    License

    CCO

    Who Can Use It

    This dataset is intended for individuals and researchers, particularly those at an intermediate skill level, who wish to practise and improve their text classification and NLP capabilities. It is also highly beneficial for anyone looking to gain practical experience in data cleaning, handling missing data, and addressing class imbalance in real-world datasets.

    Dataset Name Suggestions

    • YouTube Video Classification Data
    • NLP YouTube Metadata Dataset
    • YouTube Content Classification Dataset
    • Video Description Text Analysis Dataset

    Attributes

    Original Data Source: Youtube Videos Dataset (~3400 videos)

  18. Boston Marathon Qualifiers Dataset

    • kaggle.com
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Rock (2025). Boston Marathon Qualifiers Dataset [Dataset]. https://www.kaggle.com/datasets/runningwithrock/boston-marathon-qualifiers-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Brian Rock
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Boston
    Description

    This dataset includes a collection of individual race results that can be used to predict the cut-off time for the Boston Marathon.

    The Boston Marathon specifics qualifying times for athletes depending on their age and gender. They also set a maximum field size for the event and only accept about 20,000 runners based on these qualifying times.

    If more runners qualify and apply, they apply a cut-off time to reduce the qualified applicant pool until they reach the desired field size. In 2024, the cut-off time was 5:29. There has been speculation that this year's race will see a similarly large cut off.

  19. Results of European Parliament elections

    • kaggle.com
    Updated Apr 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Parliament (2019). Results of European Parliament elections [Dataset]. https://www.kaggle.com/datasets/eu-parliament/results-of-european-parliament-elections
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2019
    Dataset provided by
    Kaggle
    Authors
    European Parliament
    Description

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset from European Parliament hosted by the EU Open Data Portal. The Open Data Portal is found here and they update their information according the amount of data that is brought in. Explore European Parliament data using Kaggle and all of the data sources available through the European Parliament organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using the EU ODP API and Kaggle's API.

    This dataset is distributed under the following licenses: Dataset License

    Cover photo by Tamara Menzi on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  20. Tennis winning shot classification

    • kaggle.com
    zip
    Updated Sep 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    greg (2020). Tennis winning shot classification [Dataset]. https://www.kaggle.com/datasets/greg1982/tennis-winning-shot-classification
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 18, 2020
    Authors
    greg
    Description

    We considered the final of the 2017 Wimbledon grand slam tournament between Roger Federer and Marin CiliC. The game was broadcasted by BBC and the video is available on-line on YouTube.

    For each class, “winners” and “no-winners” sets of 100 example sequence is available.

    G. Tsagkatakis, M. Jaber, and P. Tsakalides, "Convolutional neural networks for the analysis of broadcasted tennis games," in Proc. Visual Information Processing and Communication Conference, International Symposium on Electronic Imaging Burlingame, CA, USA, January 28-February 1, 2018.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sushma Biswas (2019). mlcourse.ai - Dota 2 - winner prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sushmabiswas/mlcourseai-dota-2-winner-prediction-dataset
Organization logo

mlcourse.ai - Dota 2 - winner prediction Dataset

Explore at:
zip(759868828 bytes)Available download formats
Dataset updated
Sep 8, 2019
Authors
Sushma Biswas
Description

Context

Hello! I am currently taking the mlcourse.ai course and as part of one of it's in-class Kaggle competitions, this dataset was required. The data is originally hosted on git but I like to have my data right here on Kaggle. That's why this dataset.

If you find this dataset useful, do upvote. Thank you and happy learning!

Content

This dataset contains 6 files in total. 1. Sample_submission.csv 2. Train_features.csv 3. Test_features.csv 4. Train_targets.csv 5. Train_matches.jsonl 6. Test_matches.jsonl

Acknowledgements

All of the data in this dataset is originally hosted on git and the same can also be found on the in-class competition's 'data' page here.

Inspiration

  • to be updated.
Search
Clear search
Close search
Google apps
Main menu