100+ datasets found

Healthcare Competitions Dataset
kaggle.com
zip
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gouri Prakash (2025). Healthcare Competitions Dataset [Dataset]. https://www.kaggle.com/datasets/gouriprakash/healthcare-competitions
Explore at:
zip(2306357 bytes)Available download formats
Dataset updated
Jul 19, 2025
Authors
Gouri Prakash
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the set of Kaggle competitions that are pertinent to healthcare. The dataset was created following the analysis of the Competitions.csv file which is available at https://www.kaggle.com/datasets/kaggle/meta-kaggle
Kaggle Competitions Data
kaggle.com
zip
Updated Sep 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikhil Badveli (2022). Kaggle Competitions Data [Dataset]. https://www.kaggle.com/datasets/nikhilbadveli/kaggle-competitions-data/data
Explore at:
zip(566756 bytes)Available download formats
Dataset updated
Sep 9, 2022
Authors
Nikhil Badveli
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is created to understand and gain some insights on the Kaggle competitions that are currently present in the competitions page of the Kaggle platform.

I've included 3 files and explained below what each of them contains.
CrunchDAO Competition Unified Dataset
kaggle.com
zip
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Arvidsson (2023). CrunchDAO Competition Unified Dataset [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/crunchdao-competition-unified-dataset
Explore at:
zip(183163058 bytes)Available download formats
Dataset updated
Jun 15, 2023
Authors
Joakim Arvidsson
Description
This data set is for creating predictive models for the CrunchDAO tournament. Registration is required in order to participate in the competition, and to be eligible to earn $CRUNCH tokens.

See notebooks (Code tab) for how to import and explore the data, and build predictive models.

EDA

QuickStarter

See Terms of Use for data license.

Code4ML 2.0

zenodo.org

csv, txt

Updated May 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737

Explore at:

csv, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15465737

Dataset updated

May 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonimous authors; Anonimous authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

The original dataset is organized into multiple CSV files, each containing structured data on different entities:

code_blocks.csv: Contains raw code snippets extracted from Kaggle.
kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

Table 1. code_blocks.csv structure

Column	Description
code_blocks_index	Global index linking code blocks to markup_data.csv.
kernel_id	Identifier for the Kaggle Jupyter notebook from which the code block was extracted.
code_block_id	Position of the code block within the notebook.
code_block	The actual machine learning code snippet.

Table 2. kernels_meta.csv structure

Column	Description
kernel_id	Identifier for the Kaggle Jupyter notebook.
kaggle_score	Performance metric of the notebook.
kaggle_comments	Number of comments on the notebook.
kaggle_upvotes	Number of upvotes the notebook received.
kernel_link	URL to the notebook.
comp_name	Name of the associated Kaggle competition.

Table 3. competitions_meta.csv structure

Column	Description
comp_name	Name of the Kaggle competition.
description	Overview of the competition task.
data_type	Type of data used in the competition.
comp_type	Classification of the competition.
subtitle	Short description of the task.
EvaluationAlgorithmAbbreviation	Metric used for assessing competition submissions.
data_sources	Links to datasets used.
metric type	Class label for the assessment metric.

Table 4. markup_data.csv structure

Column	Description
code_block	Machine learning code block.
too_long	Flag indicating whether the block spans multiple semantic types.
marks	Confidence level of the annotation.
graph_vertex_id	ID of the semantic type.

The dataset allows mapping between these tables. For example:

code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

Code4ML 2.0 Enhancements

The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

Applications

The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

Code generation
Code understanding
Natural language processing of code-related tasks

h
kaggle
huggingface.co
Updated Jul 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
monti (2024). kaggle [Dataset]. https://huggingface.co/datasets/theoracle/kaggle
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2024
Authors
monti
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
theoracle/kaggle dataset hosted on Hugging Face and contributed by the HF Datasets community
h
kaggle-nlp-getting-start
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hui, kaggle-nlp-getting-start [Dataset]. https://huggingface.co/datasets/gdwangh/kaggle-nlp-getting-start
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
hui
Description
Dataset Summary

Natural Language Processing with Disaster Tweets: https://www.kaggle.com/competitions/nlp-getting-started/data This particular challenge is perfect for data scientists looking to get started with Natural Language Processing. The competition dataset is not too big, and even if you don’t have much personal computing power, you can do all of the work in our free, no-setup, Jupyter Notebooks environment called Kaggle Notebooks.

Columns

id - a unique identifier for each tweet… See the full description on the dataset page: https://huggingface.co/datasets/gdwangh/kaggle-nlp-getting-start.
h
Eedi-competition-kaggle-prompt-formats-mpnet
huggingface.co
Updated Sep 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EVANGELOS PAPAMITSOS (2025). Eedi-competition-kaggle-prompt-formats-mpnet [Dataset]. https://huggingface.co/datasets/VaggP/Eedi-competition-kaggle-prompt-formats-mpnet
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2025
Authors
EVANGELOS PAPAMITSOS
Description
VaggP/Eedi-competition-kaggle-prompt-formats-mpnet dataset hosted on Hugging Face and contributed by the HF Datasets community
Code Contests Dataset
kaggle.com
zip
Updated Mar 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa Sharapova (2024). Code Contests Dataset [Dataset]. https://www.kaggle.com/datasets/lallucycle/code-contests-dataset
Explore at:
zip(968796216 bytes)Available download formats
Dataset updated
Mar 17, 2024
Authors
Lisa Sharapova
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Lisa Sharapova

Released under Apache 2.0

Contents
Digit Recognizer Data Set(Kaggle contest)
kaggle.com
zip
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krishna Harsha M (2024). Digit Recognizer Data Set(Kaggle contest) [Dataset]. https://www.kaggle.com/datasets/krishnaharsham/digit-recognizer-data-setkaggle-contest
Explore at:
zip(15991969 bytes)Available download formats
Dataset updated
May 31, 2024
Authors
Krishna Harsha M
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Krishna Harsha M

Released under MIT

Contents
EC class prediction dataset
kaggle.com
zip
Updated Jul 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Mitchell (2023). EC class prediction dataset [Dataset]. https://www.kaggle.com/datasets/jbomitchell/ec-class-prediction-dataset
Explore at:
zip(8106829 bytes)Available download formats
Dataset updated
Jul 10, 2023
Authors
John Mitchell
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This dataset contains relevant notebook submission files and papers:

Notebook submission files from:

PS S3E18 EDA + Ensembles by @zhukovoleksiy v8 0.65031.

PS_3.18_LGBM_bin by @akioonodera v9 0.64706.

PS3E18 EDA| Ensemble ML Pipeline |BinaryPredictict by @tetsutani v37 0.65540.

0.65447 | Ensemble | AutoML | Enzyme Classify by @utisop v10 0.65447.

pyBoost baselinepyBoost baseline by @l0glikelihood v4 0.65446.

Random Forest EC classification by @jbomitchell RF62853_submission.csv 0.62853.

Overfit Champion by @onurkoc83 v1 0.65810.

Playground Series S3E18 - EDA & Separate Learning by @mateuszk013 v1 0.64933.

Ensemble ML Pipeline + Bagging = 0.65557 by @chingiznurzhanov v7 0.65557.

PS3E18| FeatureEnginering+Stacking by @jaygun84 v5 0.64845.

S03E18 EDA | VotingClassifier | Optuna v15 0.64776.

PS3E18 - GaussianNB by @mehrankazeminia v1 0.65898, v2 0.66009 & v3 0.66117.

Enzyme Weighted Voting by @nivedithavudayagiri v2 0.65028.

Multi-label With TF-Decision Forests by @gusthema v6 0.63374.

S3E18 Target_Encoding LB 0.65947 by @meisa0 v1 0.65947.

Boost Classifier Model by @satyaprakashshukl v7 0.64965.

PS3E18: Multiple lightgbm models + Optuna by syerramilli v4 0.64982.

s3e18_solution for overfitting public :0.64785 by @onurkoc83 v1 0.64785.

PSS3E18 : FLAML : roc_auc_weighted by @gauravduttakiit v2 0.64732.

PGS318: combiner by @kdmitrie v4 0.65350.

averaging best solutions mean vs Weighted mean by @omarrajaa v5 0.66106.

Papers

N Nath & JBO Mitchell, Is EC class predictable from reaction mechanism? BMC Bioinformatics, 13:60 (2012) doi: 10.1186/1471-2105-13-60

L De Ferrari & JBO Mitchell, From sequence to enzyme mechanism using multi-label machine learning, BMC Bioinformatics, 15:150 (2014) doi: 10.1186/1471-2105-15-150

N Nath, JBO Mitchell & G Caetano-Anollés, The Natural History of Biocatalytic Mechanisms, PLoS Computational Biology, 10, e1003642 (2014) doi: 10.1371/journal.pcbi.1003642

KE Beattie, L De Ferrari & JBO Mitchell, Why do sequence signatures predict enzyme mechanism? Homology versus Chemistry, Evolutionary Bioinformatics, 11: 267-274 (2015) doi: 10.4137/EBO.S31482

HY Mussa, L De Ferrari & JBO Mitchell, Enzyme Mechanism Prediction: A Template Matching Problem on InterPro Signature Subspaces, BMC Research Reports, 8:744 (2015) doi: 10.1186/s13104-015-1730-7

Competitions Shake-up

kaggle.com

zip

Updated Sep 27, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Daniboy370 (2020). Competitions Shake-up [Dataset]. https://www.kaggle.com/daniboy370/competitions-shakeup

Explore at:

zip(388789 bytes)Available download formats

Dataset updated

Sep 27, 2020

Authors

Daniboy370

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Shake-what ?!

The Shake phenomenon occurs when the competition is shifting between two different datasets :

\[ \text{Public test set} \ \Rightarrow \ \text{Private test set} \quad \Leftrightarrow \quad LB-\text{public} \ \Rightarrow \ LB-\text{private} \]

The private test set that so far was unavailable becomes available, and thus the models scores are re-calculated. This re-evaluation elicits a respective re-ranking of the contestants in the competition. The shake allows participants to assess the severity of their overfitting to the public dataset, and act to improve their model until the deadline.

Unable to find a uniform conventional term for this mechanism, I will use my common sense to define the following intuition :

             <img src="https://github.com/Daniboy370/Uploads/blob/master/Kaggle-shake-ups/images/latex.png?raw=true" width="550">

From the starter kernel :

               <img src="https://github.com/Daniboy370/Uploads/blob/master/Kaggle-shake-ups/vids/shakeup_VID.gif?raw=true" width="625">

Content

Seven datasets of competitions which were scraped from Kaggle :

Competition	Name of file
Elo Merchant Category Recommendation	df_{Elo}
Human Protein Atlas Image Classification	df_{Protein}
Humpback Whale Identification	df_{Humpback}
Microsoft Malware Prediction	df_{Microsoft}
Quora Insincere Questions Classification	df_{Quora}
TGS Salt Identification Challenge	df_{TGS}
VSB Power Line Fault Detection	df_{VSB}

As an example, consider the following dataframe from the Quora competition : Team Name | Rank-private | Rank-public | Shake | Score-private | Score-public --- | --- The Zoo |1|7|6|0.71323|0.71123 ...| ...| ...| ...| ...| ... D.J. Trump|1401|65|-1336|0.000|0.70573

I encourage everybody to investigate thoroughly the dataset in sought of interesting findings !

\[ \text{Enjoy !}\]

Fake News Challenge
kaggle.com
zip
Updated Apr 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinav Kumar Jha (2021). Fake News Challenge [Dataset]. https://www.kaggle.com/datasets/abhinavkrjha/fake-news-challenge
Explore at:
zip(5340415 bytes)Available download formats
Dataset updated
Apr 4, 2021
Authors
Abhinav Kumar Jha
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The issue of “fake news” has arisen recently as a potential threat to high-quality journalism and well-informed public discourse. The Fake News Challenge was organized in early 2017 to encourage development of machine learning-based classification systems that perform “stance detection” -- i.e. identifying whether a particular news headline “agrees” with, “disagrees” with, “discusses,” or is unrelated to a particular news article -- in order to allow journalists and others to more easily find and investigate possible instances of “fake news.”

Content

The data provided is (headline, body, stance) instances, where stance is one of {unrelated, discuss, agree, disagree}. The dataset is provided as two CSVs:

train_bodies.csv

This file contains the body text of articles (the articleBody column) with corresponding IDs (Body ID)

train_stances.csv

This file contains the labeled stances (the Stance column) for pairs of article headlines (Headline) and article bodies (Body ID, referring to entries in train_bodies.csv).

Distribution of the data

The distribution of Stance classes in train_stances.csv is as follows:

rows unrelated discuss agree disagree
49972 0.73131 0.17828 0.0736012 0.0168094

There are 4 possible classifications: 1. The article text agrees with the headline. 2. The article text disagrees with the headline. 3. The article text is a discussion of the headline, without taking a position on it. 4. The article text is unrelated to the headline (i.e. it doesn’t address the same topic).

Acknowledgements

For details of the task, see FakeNewsChallenge.org
Webpage Information for 5000+ Kaggle Competitions
kaggle.com
zip
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Wynne (2023). Webpage Information for 5000+ Kaggle Competitions [Dataset]. https://www.kaggle.com/anthony35813/webpage-data-for-kaggle-competitions
Explore at:
zip(102059495 bytes)Available download formats
Dataset updated
Nov 8, 2023
Authors
Anthony Wynne
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
I produced the dataset whilst working on the 2023 Kaggle AI report. The Meta Kaggle dataset provides helpful information about the Kaggle competitions but not the original descriptive text from the Kaggle web pages for each competition. We have information about the solutions but not the original problem. So, I wrote some web scraping scripts to collect and store that information.

Not all Kaggle web pages have that information available; some are missing or broken. Hence the nulls in the data. Secondly, note that not all previous Kaggle competitions exist in the Meta Kaggle data, which was used to collect the webpage slugs.

The scrapping scripts iterate over the IDs in Meta Kaggle competitions.csv data and attempt to collect the webpage data for that competition if it is currently null in the database. Hence new IDs will cause the scripts to go and collect their data, and each week, the scripts will try and fill in any links that were not working previously.

I have recently converted the original local scraping scripts on my machine into a Kaggle notebook that now updates this dataset weekly on Mondays. The notebook also explains the scraping procedure and its automation to keep this dataset up-to-date.

Note that the CompetitionId field joins to the Id of the competitions.csv of the Meta Kaggle dataset so that this information can be combined with the rest of Meta Kaggle.

My primary reason for collecting the data was for some text classification work I wanted to do, and I will publish it here soon. I hope that the data is useful to some other projects as well :-)
Kaggle Competitions Top 100
kaggle.com
zip
Updated May 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivo Vinco (2022). Kaggle Competitions Top 100 [Dataset]. https://www.kaggle.com/vivovinco/kaggle-competitions-top-100
Explore at:
zip(15932 bytes)Available download formats
Dataset updated
May 1, 2022
Authors
Vivo Vinco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

This dataset contains top 100 of Kaggle competitions ranking. The dataset will be updated every month.

Content

100 rows and 13 columns. Columns' description are listed below.

User : Name of the user

Tier : Grandmaster, Master or Expert

Company/School : Company/School info of the user if mentioned

Country : Country info of the user if mentioned

Competitions_Num : Number of competitions joined

Competitions_Gold : Number of competitions gold medals won

Competitions_Silver : Number of competitions silver medals won

Competitions_Bronze : Number of competitions bronze medals won

Datasets_Num : Number of public datasets

Notebooks_Num : Number of public notebooks

Discussions_Num : Number of topics/comments posted

Points : Total points

Profile : Link of Kaggle profile

Acknowledgements

Data from Kaggle. Image from Smartcat.

If you're reading this, please upvote.
Meta Kaggle Competitions
kaggle.com
zip
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pau Fortiana Chico (2025). Meta Kaggle Competitions [Dataset]. https://www.kaggle.com/datasets/paufortiana/meta-kaggle-competitions
Explore at:
zip(26645981 bytes)Available download formats
Dataset updated
Nov 11, 2025
Authors
Pau Fortiana Chico
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset was created to provide a stable, reliable data source for notebooks, avoiding the 'deleted-dataset' errors that can occur with the frequently-updated official Meta Kaggle dataset.
Kaggle Competition Leaderboard Results
kaggle.com
zip
Updated Nov 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Chiu (2022). Kaggle Competition Leaderboard Results [Dataset]. https://www.kaggle.com/datasets/kingychiu/kaggle-competition-leaderboard-results
Explore at:
zip(54415 bytes)Available download formats
Dataset updated
Nov 29, 2022
Authors
Anthony Chiu
Description
Dataset

This dataset was created by Anthony Chiu

Contents
Google Universal Embedding Challenge Github Repo
kaggle.com
zip
Updated Jul 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darien Schettler (2022). Google Universal Embedding Challenge Github Repo [Dataset]. https://www.kaggle.com/datasets/dschettler8845/google-universal-embedding-challenge-github-repo
Explore at:
zip(13561 bytes)Available download formats
Dataset updated
Jul 12, 2022
Authors
Darien Schettler
Description
Universal Embedding Challenge baseline model implementation.

This folder contains the baseline model implementation for the Kaggle universal image embedding challenge based on

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Training data-efficient image transformers & distillation through attention.

Following the above ideas, we also add a 64 projection layer on top of the Vision Transformer base model as the final embedding, since the competition requires embeddings of at most 64 dimensions. Please find more details in image_classification.py.

To use the code, please firstly install the prerequisites

pip install -r universal_embedding_challenge/requirements.txt git clone https://github.com/tensorflow/models.git /tmp/models export PYTHONPATH=$PYTHONPATH:/tmp/models pip install --user -r /tmp/models/official/requirements.txt

Secondly, please download the imagenet1k data in TFRecord format from https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-0 and https://www.kaggle.com/datasets/hmendonca/imagenet-1k-tfrecords-ilsvrc2012-part-1, and merge them together under folder imagenet-2012-tfrecord/. As a result, the paths to the training datasets and the validation datasets should be imagenet-2012-tfrecord/train* and imagenet-2012-tfrecord/validation*, respectively.

The trainer for the model is implemented in train.py, and the following example launches the training

python -m universal_embedding_challenge.train \ --experiment=vit_with_bottleneck_imagenet_pretrain \ --mode=train_and_eval \ --model_dir=/tmp/imagenet1k_test

The trained model checkpoints could be further converted to savedModel format using export_saved_model.py for Kaggle submission.

The code to compute metrics for Universal Embedding Challenge is implemented in metrics.py and the code to read the solution file is implemented in read_retrieval_solution.py.
OLD MLCAD 2023 FPGA MacroPlacement Contest Dataset
kaggle.com
zip
Updated Apr 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ismail Bustany (2023). OLD MLCAD 2023 FPGA MacroPlacement Contest Dataset [Dataset]. https://www.kaggle.com/datasets/ismailbustany/mlcad-2023-fpga-macro-placement-contest-dataset
Explore at:
zip(17993264858 bytes)Available download formats
Dataset updated
Apr 15, 2023
Authors
Ismail Bustany
Description
Dataset

This dataset was created by Ismail Bustany

Contents
30 Days of ML
kaggle.com
zip
Updated Mar 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Massaron (2022). 30 Days of ML [Dataset]. https://www.kaggle.com/datasets/lucamassaron/30-days-of-ml
Explore at:
zip(69158366 bytes)Available download formats
Dataset updated
Mar 6, 2022
Authors
Luca Massaron
Description
Context

The data relative to the Kaggle learning competition 30 Days of ML (https://www.kaggle.com/thirty-days-of-ml) cannot be downloaded by Kagglers who have not initially participated to it. now you can download it from here and use it for testing the many tutorials and notebooks available from the learning competition.

Content

The dataset is used for this competition is synthetic (and generated using a CTGAN), but based on a real dataset. The original dataset deals with predicting the amount of an insurance claim. Although the features are anonymized, they have properties relating to real-world features.

Acknowledgements

The data comes from a Kaggle competition, 30 Days of ML (https://www.kaggle.com/c/30-days-of-ml).
NLP with Disaster Tweets - cleaning data
kaggle.com
zip
Updated Sep 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vitalii Mokin (2021). NLP with Disaster Tweets - cleaning data [Dataset]. https://www.kaggle.com/vbmokin/nlp-with-disaster-tweets-cleaning-data
Explore at:
zip(1053715 bytes)Available download formats
Dataset updated
Sep 11, 2021
Authors
Vitalii Mokin
License
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Description
Context

The data obtained by clearing the Getting Started Prediction Competition "Real or Not? NLP with Disaster Tweets" data is the result of a public notebook "NLP with Disaster Tweets - EDA and Cleaning data". In the future, I plan to improve cleaning and update the dataset

Content

id - a unique identifier for each tweet text - the text of the tweet location - the location the tweet was sent from (may be blank) keyword - a particular keyword from the tweet (may be blank) target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

Acknowledgements

Thanks to Kaggle team for this Competition "Real or Not? NLP with Disaster Tweets" and its datasets (this dataset was created by the company figure-eight and originally shared on their ‘Data For Everyone’ website here. Tweet source: https://twitter.com/AnyOtherAnnaK/status/629195955506708480).

Thanks to web-site Ambulance services drive, strive to keep you alive for your image, which is very similar to the image of the contest "Real or Not? NLP with Disaster Tweets" and which I used as the image of my dataset

Inspiration

You are predicting whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.

rows	unrelated	discuss	agree	disagree
49972	0.73131	0.17828	0.0736012	0.0168094

Facebook

Twitter

Click to copy link

Link copied

Cite

Gouri Prakash (2025). Healthcare Competitions Dataset [Dataset]. https://www.kaggle.com/datasets/gouriprakash/healthcare-competitions

Healthcare Competitions Dataset

Kaggle competitions pertinent to the healthcare domain

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

zip(2306357 bytes)Available download formats

Dataset updated

Jul 19, 2025

Authors

Gouri Prakash

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the set of Kaggle competitions that are pertinent to healthcare. The dataset was created following the analysis of the Competitions.csv file which is available at https://www.kaggle.com/datasets/kaggle/meta-kaggle

Clear search

Close search

Google apps

Main menu

Healthcare Competitions Dataset

Kaggle Competitions Data

CrunchDAO Competition Unified Dataset

Code4ML 2.0

Code4ML 2.0 Enhancements

Applications

kaggle

kaggle-nlp-getting-start

Eedi-competition-kaggle-prompt-formats-mpnet

Code Contests Dataset

Dataset

Contents

Digit Recognizer Data Set(Kaggle contest)

Dataset

Contents

EC class prediction dataset

Competitions Shake-up

Shake-what ?!

Content

Fake News Challenge

Context

Content

train_bodies.csv

train_stances.csv

Distribution of the data

Acknowledgements

Webpage Information for 5000+ Kaggle Competitions

Kaggle Competitions Top 100

Context

Content

Acknowledgements

Meta Kaggle Competitions

Kaggle Competition Leaderboard Results

Dataset

Contents

Google Universal Embedding Challenge Github Repo

Universal Embedding Challenge baseline model implementation.

OLD MLCAD 2023 FPGA MacroPlacement Contest Dataset

Dataset

Contents

30 Days of ML

Context

Content

Acknowledgements

NLP with Disaster Tweets - cleaning data

Context

Content

Acknowledgements

Inspiration

Healthcare Competitions Dataset

Kaggle competitions pertinent to the healthcare domain

`train_bodies.csv`

`train_stances.csv`