100+ datasets found

Meta Kaggle Code
kaggle.com
zip
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(148301844275 bytes)Available download formats
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Basic R for Data Analysis
kaggle.com
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kebba Ndure (2024). Basic R for Data Analysis [Dataset]. https://www.kaggle.com/datasets/kebbandure/basic-r-for-data-analysis/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kebba Ndure
Description
ABOUT DATASET

This is the R markdown notebook. It contains step by step guide for working on Data Analysis with R. It helps you with installing the relevant packages and how to load them. it also provides a detailed summary of the "dplyr" commands that you can use to manipulate your data in the R environment.

Anyone new to R and wish to carry out some data analysis on R can check it out!
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Road-R Dataset
kaggle.com
Updated Aug 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sciencestoked (2023). Road-R Dataset [Dataset]. https://www.kaggle.com/datasets/sciencestoked/road-r-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sciencestoked
Description
Dataset

This dataset was created by sciencestoked

Contents
Reddit Conversations
kaggle.com
Updated Mar 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jerry Qu (2020). Reddit Conversations [Dataset]. https://www.kaggle.com/jerryqu/reddit-conversations/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jerry Qu
Description
Context

I've been looking for an open-domain Conversational dataset for training chatbots. I was inspired by the work done by Google Brain in 'Towards a Human-like Open-Domain Chatbot'. While Transformers/BERT are trained on all of Wikipedia, chatbots need a dataset based on conversations.

Content

This data came from Reddit posts/comments under the r/CasualConversation subreddit. The conversations under this subreddit were significantly more 'conversation like' when compared to other subreddits (Ex. r/AskReddit). I'm currently looking for other subreddits to scrape.

This dataset consists of 3 columns, where each row is a Length-3 conversation. For example:

0 - What kind of phone(s) do you guys have? 1 - I have a pixel. It's pretty great. Much better than what I had before. 2 - Does it really charge all the way in 15 min?

This data was collected between 2016-12-29 and 2019-12-31

Furthermore, I have the full comment trees (stored as Python dictionaries), which was an intermediary step to creating this dataset. I plan to add more data in the future. (Ex. Longer sequence lengths, other subreddits)

Acknowledgements / License

Data was collected using Pushshift's API. https://pushshift.io/

Currently unsure about licensing. Reddit does not appear to state a clear licensing agreement, while pushshift does not apply anything either.

Inspiration

Create an open-domain chatbot (Ex. Meena)

I'd love to see how you can represent types of conversations and cluster them. This would be monumentally helpful in collecting more data. (Ex. AskReddit conversations don't resemble typically Person-To-Person conversations. How would you identify Person-To-Person-esk conversations? Perhaps cosine similarity between word-embeddings? Or between sentence-embeddings of POS tags may be very interesting.)
Black Jack - Interactive Card Game
kaggle.com
Updated Dec 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick L Ford (2024). Black Jack - Interactive Card Game [Dataset]. http://doi.org/10.34740/kaggle/dsv/10262142
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10262142
Dataset updated
Dec 21, 2024
Dataset provided by
Kaggle
Authors
Patrick L Ford
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Introduction

Blackjack, also known as 21, is one of the most popular card games worldwide. Blackjack remains a favourite due to its mix of simplicity, luck, strategy, and fast paced game play, making it a staple in casinos.

Objective of Blackjack:

The goal of Blackjack is to have a hand value closer to 21 than the dealer's hand, without exceeding 21. If a player's hand exceeds 21, they "bust" and lose the round.

Card Values:

Number cards (2-10): These are worth their face value.

Face cards (Jack, Queen, King): Each is worth 10 points.

Ace: Can be worth either 1 or 11, depending on which value benefits the hand more without exceeding 21.

Setup:

Deck: Blackjack is typically played with one to eight standard decks of 52 cards.

Players: One or more players compete against the dealer. Each player is dealt a separate hand, and players do not compete against each other.

Table Layout: The table features spaces for player bets, cards, and chips.

Game Play:

Initial Bets:

Players place their bets in designated areas on the table.

Dealing Cards:

Each player and the dealer receive two cards.

Players' cards are dealt face-up, while the dealer gets one face-up card (up card) and one face-down card (hole card).

Player Options:

Hit: Request another card to add to their hand. Players can keep hitting until they are satisfied or bust.

Stand: Keep the current hand and end their turn.

Double Down: Double the initial bet and receive exactly one more card. Commonly allowed only on the first two cards.

Split: If the first two cards have the same rank, the player can split them into two separate hands by placing an additional bet equal to the original. Each hand is played separately.

Surrender (Optional Rule): Forfeit half the bet and end the turn. This is usually allowed only on the first two cards.

Insurance (Optional Rule): If the dealer's up card is an Ace, players may place a side bet (half the original bet) that the dealer has Blackjack. If the dealer has Blackjack, the insurance bet pays 2:1; otherwise, the player loses the insurance bet.

Dealer's Turn:

Hit until the hand value is 17 or higher.

Stand on 17 or higher (including "soft 17" in some variations).

The dealer does not have options; actions are automatic.

Winning:

Player Wins: The player's hand value is closer to 21 than the dealer's hand, or the dealer busts.

Dealer Wins: Dealer's hand value is closer to 21, or the player busts.

Push (Tie): Both hands have the same value; the player keeps their bet.

Blackjack (Natural):

If the player's initial two cards are an Ace and a 10-point card (Jack, Queen, King, or 10), they have a "Blackjack."

Blackjack typically pays 3:2 (e.g., a $10 bet wins $15).

If both the player and the dealer have Blackjack, it's a push.

House Edge and Strategy:

The casino typically has a small edge due to rules favouring the dealer (e.g., the player acts first, so they can bust before the dealer plays): - Basic strategy can minimise the house edge: - Strategy charts show the optimal play based on the player's hand and the dealer's up card. - Advanced players use card counting to track high value cards remaining in the deck, gaining an advantage.

Common Variations:

European Blackjack: Dealer receives only one card initially; no hole card until players complete their turns.

Spanish 21: Played with 48-card decks (no 10's), with bonuses for certain hands.

Pontoon: A British variation where "Five Card Trick" (five cards totalling 21 or less) is a winning hand.

Blackjack Switch: Players play two hands and can swap the second card between them.

Etiquette and Tips:

Use hand signals to indicate actions (e.g., tapping for "hit," waving for "stand").

Avoid touching chips after the deal starts.

Familiarise yourself with table-specific rules and variations.

Visualisation

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2Faa4b5d8819430e46c3203b3597666578%2FScreenshot%202024-12-21%2010.36.57.png?generation=1734781714095911&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F86038e4d98f429825106bb2e8b5f74e8%2FScreenshot%202024-12-21%2010.38.18.png?generation=1734781738030008&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F5b634959e2292840ce454745ca80062f%2FScreenshot%202024-12-21%2010.39.12.png?generation=1734781761032959&alt=media" alt="">

A Markdown document with the R code for the game of Black Jack. link

R Code

The provided R code implements a simplified version of the game Blackjack. It includes f...
Activity In R
kaggle.com
zip
Updated Aug 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manohar Reddy (2019). Activity In R [Dataset]. https://www.kaggle.com/datasets/manohar676/activity-in-r
Explore at:
zip(368 bytes)Available download formats
Dataset updated
Aug 30, 2019
Authors
Manohar Reddy
Description
Dataset

This dataset was created by Manohar Reddy

Contents
R and Python Stack Overflow Answers + Sentiment
kaggle.com
Updated May 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OJ Watson (2019). R and Python Stack Overflow Answers + Sentiment [Dataset]. https://www.kaggle.com/datasets/ojwatson/stack-overflow-output
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
OJ Watson
Description
Context

This is the output of the Stack Rudeness kernel (https://www.kaggle.com/ojwatson/stack-rudeness), as saved in Cell 17.

Content

Stack Overflow answers by the Top 10 r and python users extracted using BigQuery. Also includes data on whether the answer was accepted and some additional data based on sentiment analysis of the answer text.

Acknowledgements

BigQuery and StackOverflow
machine-learning-Python-R-in-data-science
kaggle.com
Updated Jan 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ananto Yusuf Wicaksono (2020). machine-learning-Python-R-in-data-science [Dataset]. https://www.kaggle.com/datasets/ansufw/machinelearningpythonrindatascience/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 1, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ananto Yusuf Wicaksono
Description
Dataset

This dataset was created by Ananto Yusuf Wicaksono

Contents
Medical Cost Personal Dataset
kaggle.com
Updated Jul 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdel Homi (2020). Medical Cost Personal Dataset [Dataset]. https://www.kaggle.com/d3lhomi10/medical-cost-personal-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abdel Homi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset

This dataset was created by Abdel Homi

Released under Database: Open Database, Contents: Database Contents

Contents
Submission R File
kaggle.com
Updated Jan 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seth Lanza (2023). Submission R File [Dataset]. https://www.kaggle.com/datasets/sethlanza/submission-r-file
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Seth Lanza
Description
Dataset

This dataset was created by Seth Lanza

Contents
forum-data-r-progamming-coursera
kaggle.com
zip
Updated Sep 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kelly Xu (2019). forum-data-r-progamming-coursera [Dataset]. https://www.kaggle.com/datasets/kkellyxfq/forumdatarprogammingcoursera
Explore at:
zip(425061 bytes)Available download formats
Dataset updated
Sep 9, 2019
Authors
Kelly Xu
Description
This file is for my postgraduate study. The data is concerned with the Coursera forum data of R Programming. All data has been anonymized for the purpose of data privacy.

The data scaped is dated from September 2018 to September 2019.
Using R to get data from Twitter and Binance
kaggle.com
Updated Nov 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Medou Neine (2019). Using R to get data from Twitter and Binance [Dataset]. https://www.kaggle.com/dodu63/using-r-to-get-data-from-twitter-and-binance/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Medou Neine
Description
Dataset

This dataset was created by Medou Neine

Contents
Fruits-360 dataset
kaggle.com
paperswithcode.com
+1more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihai Oltean (2025). Fruits-360 dataset [Dataset]. https://www.kaggle.com/datasets/moltean/fruits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mihai Oltean
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).

Branches

The dataset has 5 major branches:

-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.

-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.

-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.

-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.

-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.

How to cite

Mihai Oltean, Fruits-360 dataset, 2017-

Dataset properties

For the 100x100 branch

Total number of images: 138704.

Training set size: 103993 images.

Test set size: 34711 images.

Number of classes: 206 (fruits, vegetables, nuts and seeds).

Image size: 100x100 pixels.

For the original-size branch

Total number of images: 58363.

Training set size: 29222 images.

Validation set size: 14614 images

Test set size: 14527 images.

Number of classes: 90 (fruits, vegetables, nuts and seeds).

Image size: various (original, captured, size) pixels.

For the 3-body-problem branch

Total number of images: 47033.

Training set size: 34800 images.

Test set size: 12233 images.

Number of classes: 3 (Apples, Cherries, Tomatoes).

Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.

Image size: 100x100 pixels.

For the meta branch

Number of classes: 26 (fruits, vegetables, nuts and seeds).

For the multi branch

Number of images: 150.

Filename format:

For the 100x100 branch

image_index_100.jpg (e.g. 31_100.jpg) or

r_image_index_100.jpg (e.g. r_31_100.jpg) or

r?_image_index_100.jpg (e.g. r2_31_100.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).

Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.

For the original-size branch

r?_image_index.jpg (e.g. r2_31.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.

The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.

For the multi branch

The file's name is the concatenation of the names of the fruits inside that picture.

Alternate download

The Fruits-360 dataset can be downloaded from:

Kaggle https://www.kaggle.com/moltean/fruits

GitHub https://github.com/fruits-360

How fruits were filmed

Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.

A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.

Behind the fruits, we placed a white sheet of paper as a background.

Here i...
Data from: Data Mining Using R:
kaggle.com
Updated Jul 2, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Science (2018). Data Mining Using R: [Dataset]. https://www.kaggle.com/ravali566/data-mining-using-r/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data Science
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Data Science

Released under CC0: Public Domain

Contents
Road-R Dataset Sample
kaggle.com
Updated Aug 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sciencestoked (2023). Road-R Dataset Sample [Dataset]. https://www.kaggle.com/datasets/sciencestoked/road-r-dataset-sample/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sciencestoked
Description
Dataset

This dataset was created by sciencestoked

Contents
May 2015 Reddit Comments
kaggle.com
zip
Updated Jun 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2019). May 2015 Reddit Comments [Dataset]. https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015
Explore at:
zip(21429083286 bytes)Available download formats
Dataset updated
Jun 4, 2019
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Description
Recently Reddit released an enormous dataset containing all ~1.7 billion of their publicly available comments. The full dataset is an unwieldy 1+ terabyte uncompressed, so we've decided to host a small portion of the comments here for Kagglers to explore. (You don't even need to leave your browser!)

You can find all the comments from May 2015 on scripts for your natural language processing pleasure. What had redditors laughing, bickering, and NSFW-ing this spring?

Who knows? Top visualizations may just end up on Reddit.

Data Description

The database has one table, May2015, with the following fields:

created_utc

ups

subreddit_id

link_id

name

score_hidden

author_flair_css_class

author_flair_text

subreddit

id

removal_reason

gilded

downs

archived

author

score

retrieved_on

body

distinguished

edited

controversiality

parent_id
Top 10 R and Python Stack Overflow User Answers
kaggle.com
Updated May 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OJ Watson (2019). Top 10 R and Python Stack Overflow User Answers [Dataset]. https://www.kaggle.com/ojwatson/stack-answers-r-python/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
OJ Watson
Description
Context

This is the input data available for the Stack Rudeness kernel (https://www.kaggle.com/ojwatson/stack-rudeness).

Content

Stack Overflow answers by the Top 10 r and python users extracted using BigQuery. Also includes data on whether the answer was accepted downloaded from the Stack Overflow API.

Acknowledgements

BigQuery and StackOverflow
igraph in R
kaggle.com
Updated Oct 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vashu Gupta (2021). igraph in R [Dataset]. https://www.kaggle.com/datasets/vashugupta0298/igraph-in-r
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 19, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vashu Gupta
Description
Dataset

This dataset was created by Vashu Gupta

Contents
Survival Prediction with Titanic Dataset using R
kaggle.com
zip
Updated Jan 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sivasuryanarayan Krishnamoorthy (2018). Survival Prediction with Titanic Dataset using R [Dataset]. https://www.kaggle.com/sivasuryak3/survival-prediction-with-titanic-dataset-using-r
Explore at:
zip(33847 bytes)Available download formats
Dataset updated
Jan 26, 2018
Authors
Sivasuryanarayan Krishnamoorthy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Sivasuryanarayan Krishnamoorthy

Released under CC0: Public Domain

Contents

It contains the following files:

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zip(148301844275 bytes)Available download formats

Dataset updated

Jul 10, 2025

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Clear search

Close search

Google apps

Main menu

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Basic R for Data Analysis

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Road-R Dataset

Dataset

Contents

Reddit Conversations

Context

Content

Acknowledgements / License

Inspiration

Black Jack - Interactive Card Game

Introduction

Objective of Blackjack:

Card Values:

Setup:

Game Play:

Common Variations:

Etiquette and Tips:

Visualisation

R Code

Activity In R

Dataset

Contents

R and Python Stack Overflow Answers + Sentiment

Context

Content

Acknowledgements

machine-learning-Python-R-in-data-science

Dataset

Contents

Medical Cost Personal Dataset

Dataset

Contents

Submission R File

Dataset

Contents

forum-data-r-progamming-coursera

Using R to get data from Twitter and Binance

Dataset

Contents

Fruits-360 dataset

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

Branches

How to cite

Dataset properties

For the 100x100 branch

For the original-size branch

For the 3-body-problem branch

For the meta branch

For the multi branch

Filename format:

For the 100x100 branch

For the original-size branch

For the multi branch

Alternate download

How fruits were filmed

Data from: Data Mining Using R:

Dataset

Contents

Road-R Dataset Sample

Dataset

Contents

May 2015 Reddit Comments

Data Description