100+ datasets found
  1. home data for ml course

    • kaggle.com
    zip
    Updated Aug 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julián Pérez Pesce (2019). home data for ml course [Dataset]. https://www.kaggle.com/datasets/estrotococo/home-data-for-ml-course
    Explore at:
    zip(199207 bytes)Available download formats
    Dataset updated
    Aug 27, 2019
    Authors
    Julián Pérez Pesce
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Exercise: Machine Learning Competitions

    When you click on Run / All, the notebook will give you an error: "Files doesn't exist" With this DataSet you fix that. It's the same from DanB. Please UPVOTE!

    Enjoy!

  2. A

    ‘Kaggle Competitions Top 100’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Kaggle Competitions Top 100’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-competitions-top-100-961d/latest
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Competitions Top 100’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-competitions-top-100 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains top 100 of Kaggle competitions ranking. The dataset will be updated every month.

    Content

    100 rows and 13 columns. Columns' description are listed below.

    • User : Name of the user
    • Tier : Grandmaster, Master or Expert
    • Company/School : Company/School info of the user if mentioned
    • Country : Country info of the user if mentioned
    • Competitions_Num : Number of competitions joined
    • Competitions_Gold : Number of competitions gold medals won
    • Competitions_Silver : Number of competitions silver medals won
    • Competitions_Bronze : Number of competitions bronze medals won
    • Datasets_Num : Number of public datasets
    • Notebooks_Num : Number of public notebooks
    • Discussions_Num : Number of topics/comments posted
    • Points : Total points
    • Profile : Link of Kaggle profile

    Acknowledgements

    Data from Kaggle. Image from Smartcat.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  3. Z

    Solution #4 for Predicting Molecular Properties Kaggle Competition

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stojanovic, Luka (2020). Solution #4 for Predicting Molecular Properties Kaggle Competition [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3406153
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Rakocevic, Goran
    Stojanovic, Luka
    Tijanic, Nebojsa
    Popovic, Milos
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Code and additional data for solution #4 in Predicting Molecular Properties competition, described in #4 Solution [Hyperspatial Engineers].

  4. A

    ‘Kaggle Competitions Ranking’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Kaggle Competitions Ranking’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-kaggle-competitions-ranking-f15f/7682e95e/?iid=003-055&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Kaggle Competitions Ranking’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vivovinco/kaggle-competitions-ranking on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains Kaggle ranking of competitions.

    Content

    5000 rows and 8 columns. Columns' description are listed below.

    • Rank : Rank of the user
    • Tier : Grandmaster, Master or Expert
    • Username : Name of the user
    • Join Date : Year of join
    • Gold Medals : Number of gold medals
    • Silver Medals : Number of silver medals
    • Bronze Medals : Number of bronze medals
    • Points : Total points

    Acknowledgements

    Data from Kaggle. Image from Olympics.

    If you're reading this, please upvote.

    --- Original source retains full ownership of the source dataset ---

  5. playground-series-s5e3-test-final

    • kaggle.com
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dan (2025). playground-series-s5e3-test-final [Dataset]. https://www.kaggle.com/datasets/dantheshark/playground-series-s5e3-test-final/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    dan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by dan

    Released under CC0: Public Domain

    Contents

  6. Kaggle competitions: Essay AI Report

    • kaggle.com
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chimaroke Opara (2023). Kaggle competitions: Essay AI Report [Dataset]. https://www.kaggle.com/datasets/chimarokeopara/kaggle-competitions-essay-ai-report
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chimaroke Opara
    Description

    Dataset

    This dataset was created by Chimaroke Opara

    Contents

  7. h

    Eedi-competition-kaggle-prompt-formats-Phi

    • huggingface.co
    Updated Sep 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EVANGELOS PAPAMITSOS (2024). Eedi-competition-kaggle-prompt-formats-Phi [Dataset]. https://huggingface.co/datasets/VaggP/Eedi-competition-kaggle-prompt-formats-Phi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2024
    Authors
    EVANGELOS PAPAMITSOS
    Description

    VaggP/Eedi-competition-kaggle-prompt-formats-Phi dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    Kaggle-LLM-Science-Exam

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sangeetha Venkatesan (2023). Kaggle-LLM-Science-Exam [Dataset]. https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Sangeetha Venkatesan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for [LLM Science Exam Kaggle Competition]

      Dataset Summary
    

    https://www.kaggle.com/competitions/kaggle-llm-science-exam/data

      Languages
    

    [en, de, tl, it, es, fr, pt, id, pl, ro, so, ca, da, sw, hu, no, nl, et, af, hr, lv, sl]

      Dataset Structure
    

    Columns prompt - the text of the question being asked A - option A; if this option is correct, then answer will be A B - option B; if this option is correct, then answer will be B C - option C; if this… See the full description on the dataset page: https://huggingface.co/datasets/Sangeetha/Kaggle-LLM-Science-Exam.

  9. mlcourse.ai - Dota 2 - winner prediction Dataset

    • kaggle.com
    zip
    Updated Sep 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sushma Biswas (2019). mlcourse.ai - Dota 2 - winner prediction Dataset [Dataset]. https://www.kaggle.com/datasets/sushmabiswas/mlcourseai-dota-2-winner-prediction-dataset
    Explore at:
    zip(759868828 bytes)Available download formats
    Dataset updated
    Sep 8, 2019
    Authors
    Sushma Biswas
    Description

    Context

    Hello! I am currently taking the mlcourse.ai course and as part of one of it's in-class Kaggle competitions, this dataset was required. The data is originally hosted on git but I like to have my data right here on Kaggle. That's why this dataset.

    If you find this dataset useful, do upvote. Thank you and happy learning!

    Content

    This dataset contains 6 files in total. 1. Sample_submission.csv 2. Train_features.csv 3. Test_features.csv 4. Train_targets.csv 5. Train_matches.jsonl 6. Test_matches.jsonl

    Acknowledgements

    All of the data in this dataset is originally hosted on git and the same can also be found on the in-class competition's 'data' page here.

    Inspiration

    • to be updated.
  10. h

    BirdCLEF-Challenge2023-Kaggle

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernardo Cecchetto, BirdCLEF-Challenge2023-Kaggle [Dataset]. https://huggingface.co/datasets/bernardocecchetto/BirdCLEF-Challenge2023-Kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Bernardo Cecchetto
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains audios of 264 species of birds singing that were all processed. It was processed as follows:

    Stereo to Mono Resampled 16kHz High Pass Filter (1500Hz and filter order of 16) Normalized

    The raw dataset was provided by the BirdCLEF 2023 challenge from Kaggle. You can access it in https://www.kaggle.com/competitions/birdclef-2023/data

  11. roberta-fine-tuned

    • kaggle.com
    Updated Aug 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thibaut Juill (2023). roberta-fine-tuned [Dataset]. https://www.kaggle.com/datasets/thibautjuill/roberta-fine-tuned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Thibaut Juill
    Description

    Fine tuned model base on roberta-base : https://www.kaggle.com/datasets/abhishek/roberta-base

    This model was trained for CommonLit - Evaluate Student Summaries competition (https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries/overview). Please follow the rules of the competition before use this model.

  12. Code4ML 2.0

    • zenodo.org
    csv, txt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonimous authors; Anonimous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

    The original dataset is organized into multiple CSV files, each containing structured data on different entities:

    • code_blocks.csv: Contains raw code snippets extracted from Kaggle.
    • kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
    • competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
    • markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
    • vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

    Table 1. code_blocks.csv structure

    ColumnDescription
    code_blocks_indexGlobal index linking code blocks to markup_data.csv.
    kernel_idIdentifier for the Kaggle Jupyter notebook from which the code block was extracted.
    code_block_id

    Position of the code block within the notebook.

    code_block

    The actual machine learning code snippet.

    Table 2. kernels_meta.csv structure

    ColumnDescription
    kernel_idIdentifier for the Kaggle Jupyter notebook.
    kaggle_scorePerformance metric of the notebook.
    kaggle_commentsNumber of comments on the notebook.
    kaggle_upvotesNumber of upvotes the notebook received.
    kernel_linkURL to the notebook.
    comp_nameName of the associated Kaggle competition.

    Table 3. competitions_meta.csv structure

    ColumnDescription
    comp_nameName of the Kaggle competition.
    descriptionOverview of the competition task.
    data_typeType of data used in the competition.
    comp_typeClassification of the competition.
    subtitleShort description of the task.
    EvaluationAlgorithmAbbreviationMetric used for assessing competition submissions.
    data_sourcesLinks to datasets used.
    metric typeClass label for the assessment metric.

    Table 4. markup_data.csv structure

    ColumnDescription
    code_blockMachine learning code block.
    too_longFlag indicating whether the block spans multiple semantic types.
    marksConfidence level of the annotation.
    graph_vertex_idID of the semantic type.

    The dataset allows mapping between these tables. For example:

    • code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
    • kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

    In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

    Code4ML 2.0 Enhancements

    The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

    competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

    Applications

    The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

    • Code generation
    • Code understanding
    • Natural language processing of code-related tasks
  13. A

    ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Top 1000 Kaggle Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-top-1000-kaggle-datasets-658b/b992f64b/?iid=004-457&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Top 1000 Kaggle Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/notkrishna/top-1000-kaggle-datasets on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

    --- Original source retains full ownership of the source dataset ---

  14. r

    STAT 8051 Kaggle Competition Codebook - Group 4

    • rpubs.com
    Updated Dec 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linh Nguyen (2020). STAT 8051 Kaggle Competition Codebook - Group 4 [Dataset]. https://rpubs.com/nguyenllpsych/stat8051
    Explore at:
    Dataset updated
    Dec 13, 2020
    Authors
    Linh Nguyen
    Variables measured
    area, dr_age, gender, veh_age, exposure, veh_body, claim_ind, veh_value, claim_cost, claim_count
    Description

    Basic summary statistics and codebook, excluding ID variable, for the training dataset from the 2020 Travelers Modeling Competition - Predicting Claim Cost

    Table of variables

    This table contains variable names, labels, and number of missing values. See the complete codebook for more.

    namelabeln_missing
    veh_valueMarket value of the vehicle in $10,000’s0
    exposureThe basic unit of risk underlying an insurance premium0
    veh_bodyType of vehicles0
    veh_ageAge of vehicles0
    genderGender of driver0
    areaDriving area of residence0
    dr_ageDriver’s age category0
    claim_indIndicator of claim0
    claim_countThe number of claims0
    claim_costClaim amount0

    Note

    This dataset was automatically described using the codebook R package (version 0.9.2).

  15. AIMO Synthetic Dataset

    • kaggle.com
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirza Milan Farabi (2024). AIMO Synthetic Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10144884
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mirza Milan Farabi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    AIMO Synthetic Dataset This synthetic dataset consists of 50 mathematical problems, each designed to mimic the complexity and rigor typically found in National Olympiad-level competitions. The problems span across four main areas of mathematics: algebra, combinatorics, geometry, and number theory. Each problem is formatted in LaTeX, ensuring high-quality typesetting and clarity.

    DOI Citation Mirza Milan Farabi. (2024). AIMO Synthetic Dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/10144884

  16. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(143722388562 bytes)Available download formats
    Dataset updated
    Jun 5, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  17. Meta Kaggle Prize Money

    • kaggle.com
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JohnM (2023). Meta Kaggle Prize Money [Dataset]. https://www.kaggle.com/jpmiller/meta-kaggle-moneyboard/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    JohnM
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    NEW: leaderboard.csv with lifetime earnings for all Kagglers

    Have you ever wondered how much prize money gets distributed through Kaggle competitions? Or how much top earners have won? Here's the data to help answer such questions. Money awarded for each competition is itemized by leaderboard rank and matched with the teams/users at that rank. It's assumed that teams evenly split their winnings among members.

    The dataset captures nearly $16M total prize money awarded for top leaderboard finishes. Prize breakdowns were taken from Kaggle web pages. Pages and prize descriptions had many different page formats/wording, especially before 2017, so coverage prior to that time is incomplete.

    Amounts here reflect the data contained in Meta-Kaggle and as such don't account for the following occurrences: - Milestone prizes - Efficiency awards - Non-cash prizes - Teams in the money zone that didn't qualify - Unequal distributions within teams

    Last update: July 8, 2023.

  18. h

    eyepacs

    • huggingface.co
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diego (2025). eyepacs [Dataset]. https://huggingface.co/datasets/bumbledeep/eyepacs
    Explore at:
    Dataset updated
    Apr 8, 2025
    Authors
    Diego
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

    All the images of the dataset come from this kaggle dataset. Some minor modifications have been made to the metadata. All credit goes to the original authors and the contributor on Kaggle.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The EyePACS dataset consists of retinal images originally published in the Kaggle competition "Diabetic Retinopathy Detection". This version includes a subset of the original data, specifically the… See the full description on the dataset page: https://huggingface.co/datasets/bumbledeep/eyepacs.

  19. h

    olympiad-math-contest-llama3-20k

    • huggingface.co
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Amiri (2024). olympiad-math-contest-llama3-20k [Dataset]. https://huggingface.co/datasets/kevin009/olympiad-math-contest-llama3-20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2024
    Authors
    Kevin Amiri
    Description

    AMC/AIME Mathematics Problem and Solution Dataset

      Dataset Details
    

    Dataset Name: AMC/AIME Mathematics Problem and Solution Dataset Version: 1.0 Release Date: 2024-06-1 Authors: Kevin Amiri

      Intended Use
    

    Primary Use: The dataset is created and intended for research and an AI Mathematical Olympiad Kaggle competition. Intended Users: Researchers in AI & mathematics or science.

      Dataset Composition
    

    Number of Examples: 20,300 problems and solution sets… See the full description on the dataset page: https://huggingface.co/datasets/kevin009/olympiad-math-contest-llama3-20k.

  20. G2Net Gravitational Wave Detection - Raw Unwhitened Data - Part 1

    • zenodo.org
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael J. Williams; Michael J. Williams; Chris Messenger; Chris Messenger (2025). G2Net Gravitational Wave Detection - Raw Unwhitened Data - Part 1 [Dataset]. http://doi.org/10.5281/zenodo.15168983
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael J. Williams; Michael J. Williams; Chris Messenger; Chris Messenger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw unwhitened strain data used to produce the final data used for the G2Net Gravitational Wave Detection Kaggle competition.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Julián Pérez Pesce (2019). home data for ml course [Dataset]. https://www.kaggle.com/datasets/estrotococo/home-data-for-ml-course
Organization logo

home data for ml course

Dataset fixer for: "Exercise: Machine Learning Competitions"

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
zip(199207 bytes)Available download formats
Dataset updated
Aug 27, 2019
Authors
Julián Pérez Pesce
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Exercise: Machine Learning Competitions

When you click on Run / All, the notebook will give you an error: "Files doesn't exist" With this DataSet you fix that. It's the same from DanB. Please UPVOTE!

Enjoy!

Search
Clear search
Close search
Google apps
Main menu