62 datasets found

Meta Kaggle Code
kaggle.com
zip
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(167219625372 bytes)Available download formats
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Code4ML 2.0

zenodo.org

csv, txt

Updated May 19, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737

Explore at:

csv, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15465737

Dataset updated

May 19, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Anonimous authors; Anonimous authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

The original dataset is organized into multiple CSV files, each containing structured data on different entities:

code_blocks.csv: Contains raw code snippets extracted from Kaggle.
kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

Table 1. code_blocks.csv structure

Column	Description
code_blocks_index	Global index linking code blocks to markup_data.csv.
kernel_id	Identifier for the Kaggle Jupyter notebook from which the code block was extracted.
code_block_id	Position of the code block within the notebook.
code_block	The actual machine learning code snippet.

Table 2. kernels_meta.csv structure

Column	Description
kernel_id	Identifier for the Kaggle Jupyter notebook.
kaggle_score	Performance metric of the notebook.
kaggle_comments	Number of comments on the notebook.
kaggle_upvotes	Number of upvotes the notebook received.
kernel_link	URL to the notebook.
comp_name	Name of the associated Kaggle competition.

Table 3. competitions_meta.csv structure

Column	Description
comp_name	Name of the Kaggle competition.
description	Overview of the competition task.
data_type	Type of data used in the competition.
comp_type	Classification of the competition.
subtitle	Short description of the task.
EvaluationAlgorithmAbbreviation	Metric used for assessing competition submissions.
data_sources	Links to datasets used.
metric type	Class label for the assessment metric.

Table 4. markup_data.csv structure

Column	Description
code_block	Machine learning code block.
too_long	Flag indicating whether the block spans multiple semantic types.
marks	Confidence level of the annotation.
graph_vertex_id	ID of the semantic type.

The dataset allows mapping between these tables. For example:

code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

Code4ML 2.0 Enhancements

The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

Applications

The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

Code generation
Code understanding
Natural language processing of code-related tasks

Sentence-transformers-2.2.0
kaggle.com
zip
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nabil Arnaoot (2022). Sentence-transformers-2.2.0 [Dataset]. https://www.kaggle.com/datasets/narnaoot/sentencetransformers220/discussion
Explore at:
zip(519799 bytes)Available download formats
Dataset updated
Jun 1, 2022
Authors
Nabil Arnaoot
Description
If you need help setting this up to use in a notebook with the internet off, check this notebook: https://www.kaggle.com/code/narnaoot/installing-packages-without-internet-for-kaggle
Kaggle Analytics Competitions - Metadata
kaggle.com
zip
Updated Nov 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrada (2022). Kaggle Analytics Competitions - Metadata [Dataset]. https://www.kaggle.com/datasets/andradaolteanu/kaggle-analytics-competitions-metadata
Explore at:
zip(183843 bytes)Available download formats
Dataset updated
Nov 1, 2022
Authors
Andrada
Description
Context

I have gathered this data to create a small analysis (an analysis within an analysis - inception like situation) to understand what makes a notebook win a Kaggle Analytics Competition.

Furthermore, the data lets us explore some differences in approaches between competitions and the evolution through time.

Of course, as we are talking about an analytical approach (which cannot be quantified, like a normal Kaggle Competition, that has a KPI), there can never be an EXACT recipe. However if we look at some quanitity (and then quality by reading the notebooks) features we can quickly see a pattern within the winning notebooks.

This knowledge might help you when you approach a new challenge, as well as guide on the "right" path.

Note: the dataset contains only PAST competitions that have already ended and the winners have been announced.
h
kaggle-nlp-getting-start
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hui, kaggle-nlp-getting-start [Dataset]. https://huggingface.co/datasets/gdwangh/kaggle-nlp-getting-start
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
hui
Description
Dataset Summary

Natural Language Processing with Disaster Tweets: https://www.kaggle.com/competitions/nlp-getting-started/data This particular challenge is perfect for data scientists looking to get started with Natural Language Processing. The competition dataset is not too big, and even if you don’t have much personal computing power, you can do all of the work in our free, no-setup, Jupyter Notebooks environment called Kaggle Notebooks.

Columns

id - a unique identifier for each tweet… See the full description on the dataset page: https://huggingface.co/datasets/gdwangh/kaggle-nlp-getting-start.
Kaggle Competitions Top 100
kaggle.com
zip
Updated May 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivo Vinco (2022). Kaggle Competitions Top 100 [Dataset]. https://www.kaggle.com/vivovinco/kaggle-competitions-top-100
Explore at:
zip(15932 bytes)Available download formats
Dataset updated
May 1, 2022
Authors
Vivo Vinco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

This dataset contains top 100 of Kaggle competitions ranking. The dataset will be updated every month.

Content

100 rows and 13 columns. Columns' description are listed below.

User : Name of the user

Tier : Grandmaster, Master or Expert

Company/School : Company/School info of the user if mentioned

Country : Country info of the user if mentioned

Competitions_Num : Number of competitions joined

Competitions_Gold : Number of competitions gold medals won

Competitions_Silver : Number of competitions silver medals won

Competitions_Bronze : Number of competitions bronze medals won

Datasets_Num : Number of public datasets

Notebooks_Num : Number of public notebooks

Discussions_Num : Number of topics/comments posted

Points : Total points

Profile : Link of Kaggle profile

Acknowledgements

Data from Kaggle. Image from Smartcat.

If you're reading this, please upvote.
Meta Kaggle Competitions
kaggle.com
zip
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pau Fortiana Chico (2025). Meta Kaggle Competitions [Dataset]. https://www.kaggle.com/datasets/paufortiana/meta-kaggle-competitions
Explore at:
zip(26645981 bytes)Available download formats
Dataset updated
Nov 11, 2025
Authors
Pau Fortiana Chico
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset was created to provide a stable, reliable data source for notebooks, avoiding the 'deleted-dataset' errors that can occur with the frequently-updated official Meta Kaggle dataset.
Webpage Information for 5000+ Kaggle Competitions
kaggle.com
zip
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Wynne (2023). Webpage Information for 5000+ Kaggle Competitions [Dataset]. https://www.kaggle.com/anthony35813/webpage-data-for-kaggle-competitions
Explore at:
zip(102059495 bytes)Available download formats
Dataset updated
Nov 8, 2023
Authors
Anthony Wynne
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
I produced the dataset whilst working on the 2023 Kaggle AI report. The Meta Kaggle dataset provides helpful information about the Kaggle competitions but not the original descriptive text from the Kaggle web pages for each competition. We have information about the solutions but not the original problem. So, I wrote some web scraping scripts to collect and store that information.

Not all Kaggle web pages have that information available; some are missing or broken. Hence the nulls in the data. Secondly, note that not all previous Kaggle competitions exist in the Meta Kaggle data, which was used to collect the webpage slugs.

The scrapping scripts iterate over the IDs in Meta Kaggle competitions.csv data and attempt to collect the webpage data for that competition if it is currently null in the database. Hence new IDs will cause the scripts to go and collect their data, and each week, the scripts will try and fill in any links that were not working previously.

I have recently converted the original local scraping scripts on my machine into a Kaggle notebook that now updates this dataset weekly on Mondays. The notebook also explains the scraping procedure and its automation to keep this dataset up-to-date.

Note that the CompetitionId field joins to the Id of the competitions.csv of the Meta Kaggle dataset so that this information can be combined with the rest of Meta Kaggle.

My primary reason for collecting the data was for some text classification work I wanted to do, and I will publish it here soon. I hope that the data is useful to some other projects as well :-)
List of public notebooks: AI Report competition
kaggle.com
zip
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Mooney (2023). List of public notebooks: AI Report competition [Dataset]. https://www.kaggle.com/datasets/paultimothymooney/list-of-public-notebooks-ai-report-competition
Explore at:
zip(9779 bytes)Available download formats
Dataset updated
Jul 7, 2023
Authors
Paul Mooney
Description
The 2023 Kaggle AI Report Competition had a deadline to make all notebooks public prior to the July 5th deadline. This dataset contains a preliminary list of all of those notebooks sorted by category.

See the competition overview, data, evaluation, submission instructions, and timeline pages for more detail about the competition itself.
Titanic Dataset
kaggle.com
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Mudasar Sabir (2025). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/mudasarsabir/titanic-dataset
Explore at:
zip(8350 bytes)Available download formats
Dataset updated
Apr 25, 2025
Authors
Muhammad Mudasar Sabir
Description
Description 👋🛳️ Ahoy, welcome to Kaggle! You’re in the right place. This is the legendary Titanic ML competition – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.

If you want to talk with other users about this competition, come join our Discord! We've got channels for competitions, job postings and career discussions, resources, and socializing with your fellow data scientists. Follow the link here: https://discord.gg/kaggle

The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.

Read on or watch the video below to explore more details. Once you’re ready to start competing, click on the "Join Competition button to create an account and gain access to the competition data. Then check out Alexis Cook’s Titanic Tutorial that walks you through step by step how to make your first submission!

The Challenge The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Recommended Tutorial We highly recommend Alexis Cook’s Titanic Tutorial that walks you through making your very first submission step by step and this starter notebook to get started.

How Kaggle’s Competitions Work Join the Competition Read about the challenge description, accept the Competition Rules and gain access to the competition dataset. Get to Work Download the data, build models on it locally or on Kaggle Notebooks (our no-setup, customizable Jupyter Notebooks environment with free GPUs) and generate a prediction file. Make a Submission Upload your prediction as a submission on Kaggle and receive an accuracy score. Check the Leaderboard See how your model ranks against other Kagglers on our leaderboard. Improve Your Score Check out the discussion forum to find lots of tutorials and insights from other competitors. Kaggle Lingo Video You may run into unfamiliar lingo as you dig into the Kaggle discussion forums and public notebooks. Check out Dr. Rachael Tatman’s video on Kaggle Lingo to get up to speed!

What Data Will I Use in This Competition? In this competition, you’ll gain access to two similar datasets that include passenger information like name, age, gender, socio-economic class, etc. One dataset is titled train.csv and the other is titled test.csv.

Train.csv will contain the details of a subset of the passengers on board (891 to be exact) and importantly, will reveal whether they survived or not, also known as the “ground truth”.

The test.csv dataset contains similar information but does not disclose the “ground truth” for each passenger. It’s your job to predict these outcomes.

Using the patterns you find in the train.csv data, predict whether the other 418 passengers on board (found in test.csv) survived.

Check out the “Data” tab to explore the datasets even further. Once you feel you’ve created a competitive model, submit it to Kaggle to see where your model stands on our leaderboard against other Kagglers.

How to Submit your Prediction to Kaggle Once you’re ready to make a submission and get on the leaderboard:

Click on the “Submit Predictions” button

Upload a CSV file in the submission file format. You’re able to submit 10 submissions a day.

Submission File Format: You should submit a csv file with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyond PassengerId and Survived) or rows.

The file should have exactly 2 columns:

PassengerId (sorted in any order) Survived (contains your binary predictions: 1 for survived, 0 for deceased) Got it! I’m ready to get started. Where do I get help if I need it? For Competition Help: Titanic Discussion Forum Kaggle doesn’t have a dedicated team to help troubleshoot your code so you’ll typically find that you receive a response more quickly by asking your question in the appropriate forum. The forums are full of useful information on the data, metric, and different approaches. We encourage you to use the forums often. If you share your knowledge, you'll find that others will share a lot in turn!

A Last Word on Kaggle Notebooks As we mentioned before, Kaggle Notebooks is our no-setup, customizable, Jupyter Notebooks environment with free GPUs and a huge repository ...
Top Kagglers Rankings
kaggle.com
zip
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AJ Pass (2020). Top Kagglers Rankings [Dataset]. https://www.kaggle.com/ajpass/top-kagglers-rankings
Explore at:
zip(352259 bytes)Available download formats
Dataset updated
Aug 20, 2020
Authors
AJ Pass
Description
Context

This dataset was obtained using four similar web scrappers made in python, more information at content.

Content

topKagglersCompetitions.csv: Inside this dataset are the top kagglers at competitions with no biography data. Scrapper used: https://www.kaggle.com/ajpass/web-scrapping-vol-7-kaggle-competitions

topKagglersDatasets.csv: Inside this dataset are the top kagglers at datasets with no biography data. Scrapper used: https://www.kaggle.com/ajpass/data-mining-web-scrapping-vol-4-kaggle-datasets2

topKagglersDiscussion.csv: Inside this dataset are the top kagglers at discussions with no biography data. Scrapper used: https://www.kaggle.com/ajpass/web-scrapping-vol-6-kaggle-discussions

topKagglersNotebooks.csv: Inside this dataset are the top kagglers at notebooks with no biography data. Scrapper used: https://www.kaggle.com/ajpass/data-mining-web-scrapping-vol-5-kaggle-notebooks
GPT-2 Offline Model and Tokenizer for Kaggle
kaggle.com
zip
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Bhat (2024). GPT-2 Offline Model and Tokenizer for Kaggle [Dataset]. https://www.kaggle.com/datasets/rahulbhat44/gpt-2-offline-model-and-tokenizer-for-kaggle/code
Explore at:
zip(463253154 bytes)Available download formats
Dataset updated
Dec 4, 2024
Authors
Rahul Bhat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the pre-downloaded GPT-2 model and tokenizer files for offline use in Kaggle notebooks. It enables participants to use GPT-2 without requiring internet access, ensuring compliance with competition rules that restrict internet usage.

The dataset includes: - GPT-2 Model: Config file, weights (model.safetensors), and other necessary files. - GPT-2 Tokenizer: Vocabulary, merges, and tokenizer configuration files.

Use this dataset to load GPT-2 seamlessly into your notebook for generating text or other applications.

Contents: - gpt2_model.zip: Contains model weights and configuration files. - gpt2_tokenizer.zip: Contains tokenizer configuration and vocabulary files.

Usage: Add this dataset to your notebook via the Kaggle dataset panel. Unzip the files and load them using the Hugging Face Transformers library with the from_pretrained method, pointing to the unzipped directories.

Licenses: The dataset reuses open-source GPT-2 files available under the original licensing terms provided by Hugging Face.

Purpose: This dataset was created for use in competitions where internet access is disabled to facilitate the usage of pre-trained models.
LLM 20 Questions Games
kaggle.com
zip
Updated Aug 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
waechter (2024). LLM 20 Questions Games [Dataset]. https://www.kaggle.com/datasets/waechter/llm-20-questions-games
Explore at:
zip(189837141 bytes)Available download formats
Dataset updated
Aug 7, 2024
Authors
waechter
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Episodes games from https://www.kaggle.com/competitions/llm-20-questions This dataset can be used to analyze winning strategies, or as training data

description:

index is {episodeId}_{guesser}_{answer} (2 rows for each episodeId, one by team)

answers: list (len nb_round) of answers by {answer} agent

questions: list (len nb_round) of questions asked by {answer} agent

guesses: list (len nb_round) of guesses {answer} agent

keyword: keyword to be guessed

category: category of the keyword

guesser: name of guesser/asker team

answerer: name of answerer team

nb_round: int number of rounds (<20 means victory or error)

game_num: episodeId

source

Notebook: https://www.kaggle.com/code/waechter/llm-20-questions-games-dataset/notebook Meta kaggle dataset
Kaggle ranking datasets
kaggle.com
zip
Updated Sep 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tropicbird (2023). Kaggle ranking datasets [Dataset]. https://www.kaggle.com/datasets/hdsk38/comp-top-1000-data/discussion
Explore at:
zip(3780060 bytes)Available download formats
Dataset updated
Sep 17, 2023
Authors
tropicbird
Description
About

This is the top 1000 user data of the four types of rankings (i.e., Competition, Datasets, Notebooks, and Discussion) from October 2021 to September 2023. The data was scraped from the Kaggle Ranking every month. The scraping code is in GitHub.

Note: Only the top 20 users' data have been stored in August 2023.

Note: Data collection ended in September 2023.

Dates the ranking data scraped

In 2021: - Competitions: Oct. 4, Nov. 21, Dec. 16 - Datasets: Oct. 12, Nov. 21, Dec. 16 - Notebooks: Oct. 13, Nov. 23, Dec. 16 - Discussion: Oct. 17, Nov. 23, Dec. 16

In 2022: - Competitions: Jan. 16, Feb. 20, Mar. 15, Apr. 15, May 15, June 15, Jul 15, Aug 15, Sep 19, Oct 15, Nov 15, Dec 16 - Datasets: Jan. 16, Feb. 20, Mar. 15, Apr. 15, May 15, June 15, Jul 15, Aug 15, Sep 19, Oct 15, Nov 15, Dec 16 - Notebooks: Jan. 16, Feb. 20, Mar. 15, Apr. 15, May 15, June 15, Jul 15, Aug 15, Sep 19, Oct 15, Nov 15, Dec 16 - Discussion: Jan. 16, Feb. 20, Mar. 15, Apr. 15, May 15, June 15, Jul 15, Aug 15, Sep 19, Oct 15, Nov15, Dec 18

In 2023 - Competitions: Jan. 13, Feb. 21, Mar 14, Apr 15, May 17, Jun 20, Jul 20, Aug 20, Sep 12 - Datasets: Jan. 13, Feb. 21, Mar 14, Apr 16, May 17, Jun 20, Jul 20, Aug 20, Sep 12 - Notebooks: Jan. 13, Feb. 21, Mar 15, Apr 15, May 16, Jun 20, Jul 20, Aug 20, Sep 12 - Discussion: Jan. 13, Feb. 23, Mar 16, Apr 15, May 16, Jun 20, Jul 20, Aug 20, Sep 12
SnakeCLEF2023HF
kaggle.com
zip
Updated Mar 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raghvender (2023). SnakeCLEF2023HF [Dataset]. https://www.kaggle.com/datasets/raghvender/snakeclef2023hf
Explore at:
zip(35376564179 bytes)Available download formats
Dataset updated
Mar 20, 2023
Authors
Raghvender
Description
This dataset contains the dataset for the SnakeCLEF2023 HuggingFace dataset.

https://huggingface.co/spaces/competitions/SnakeCLEF2023

This dataset does not contain Full Size Image Training Data 60 GB. I wanted everyone to use the data on Kaggle notebooks and participate in the competition.
DAIGT-SaveEverything
kaggle.com
zip
Updated Jan 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HongCheng (2024). DAIGT-SaveEverything [Dataset]. https://www.kaggle.com/datasets/chg0901/daigt-saveeverything
Explore at:
zip(719677868 bytes)Available download formats
Dataset updated
Jan 1, 2024
Authors
HongCheng
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Related discussion https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/464765

Related notebooks Version1: detailed results of it to check how/if the dataset is saved and reloaded step-by-step https://www.kaggle.com/code/chg0901/saveeverything-with-daigtext961-notebook?scriptVersionId=157295700

Version2: clean codes with a dataset containing the saved results from the origianl notebook https://www.kaggle.com/code/chg0901/saveeverything-with-daigtext961-notebook/notebook
Nvidia Apex
kaggle.com
Updated Jun 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumukh (2020). Nvidia Apex [Dataset]. https://www.kaggle.com/ii5m0k3ii/nvidia-apex/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sumukh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Sumukh

Released under CC0: Public Domain

Contents
AI-Kaggle-Assistant-File
kaggle.com
zip
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mateusz (2024). AI-Kaggle-Assistant-File [Dataset]. https://www.kaggle.com/datasets/mateo252/ai-kaggle-assistant-file
Explore at:
zip(64505 bytes)Available download formats
Dataset updated
Nov 11, 2024
Authors
Mateusz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This AI-Kaggle-Assistant-File dataset is part of a notebook that has been specially prepared for use in the competition task Google - Gemini Long Context.

The following files can be found here:

all-css-style.html - html file contains only CSS tags for styling notebook elements,

cache_animation.gif - gif image as an additional visual element in the notebook,

generate_notebook_prompt.txt - special instructions for Gemini model to generate new data for notebook and required format of returned data,

generated_notebook_template.txt - template for proper display of data returned by the model,

improve_notebook_prompt.txt - second special instruction for the Gemini model to return the correct data,

improved_notebook_template.txt - second template to proper display of data returned by the model,

kaggle_notebook_template.txt - another template to proper display of data returned by the model,

my_titanic_markdown_notebook.md - my notebook with one of my projects containing analysis of the popular Titanic collection. It is used as an example in the project.
"Meetings are BORING!"
kaggle.com
zip
Updated Mar 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
steubk (2023). "Meetings are BORING!" [Dataset]. https://www.kaggle.com/datasets/steubk/meetings-are-boring
Explore at:
zip(2656 bytes)Available download formats
Dataset updated
Mar 20, 2023
Authors
steubk
Description
Dataset generated by https://www.kaggle.com/steubk/meetings-are-boring-the-notebook See https://www.kaggle.com/competitions/predict-student-performance-from-game-play/discussion/396068 for details
Pakistan Online Product Sales Dataset
kaggle.com
zip
Updated Nov 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aliza Brand (2025). Pakistan Online Product Sales Dataset [Dataset]. https://www.kaggle.com/datasets/shahzadi786/pakistan-online-product-sales-dataset
Explore at:
zip(13739 bytes)Available download formats
Dataset updated
Nov 16, 2025
Authors
Aliza Brand
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Pakistan
Description
Context

Online e-commerce is rapidly growing in Pakistan. Sellers list thousands of products across multiple categories, each with different prices, ratings, and sales numbers. Understanding the patterns of product sales, pricing, and customer feedback is crucial for businesses and data scientists alike.

This dataset simulates a realistic snapshot of online product sales in Pakistan, including diverse categories like Electronics, Clothing, Home & Kitchen, Books, Beauty, and Sports.

Source

Generated synthetically using Python and NumPy for learning and practice purposes.

No real personal or private data is included.

Designed specifically for Kaggle competitions, notebooks, and ML/EDA exercises.

About the File

File name: Pakistan_Online_Product_Sales.csv

Rows: 1000+

Columns: 6

Purpose:

Train Machine Learning models (regression/classification)

Explore data through EDA and visualizations

Practice feature engineering and data preprocessing

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:

zip(167219625372 bytes)Available download formats

Dataset updated

Nov 27, 2025

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Clear search

Close search

Google apps

Main menu

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Code4ML 2.0

Code4ML 2.0 Enhancements

Applications

Sentence-transformers-2.2.0

Kaggle Analytics Competitions - Metadata

Context

kaggle-nlp-getting-start

Kaggle Competitions Top 100

Context

Content

Acknowledgements

Meta Kaggle Competitions

Webpage Information for 5000+ Kaggle Competitions

List of public notebooks: AI Report competition

Titanic Dataset

Top Kagglers Rankings

Context

Content

GPT-2 Offline Model and Tokenizer for Kaggle

LLM 20 Questions Games

description:

source

Kaggle ranking datasets

About

Dates the ranking data scraped

SnakeCLEF2023HF

DAIGT-SaveEverything

Nvidia Apex

Dataset

Contents

AI-Kaggle-Assistant-File

"Meetings are BORING!"

Pakistan Online Product Sales Dataset

Meta Kaggle Code

Kaggle's public data on notebook code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments