This dataset represents the data related to food recommender system. Two datasets are included in this dataset file. First includes the dataset related to the foods, ingredients, cuisines involved. Second, includes the dataset of the rating system for the recommendation system.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions
in Meta Kaggle. The file names match the ids in the KernelVersions
csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads
. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DistilKaggle is a curated dataset extracted from Kaggle Jupyter notebooks spanning from September 2015 to October 2023. This dataset is a distilled version derived from the download of over 300GB of Kaggle kernels, focusing on essential data for research purposes. The dataset exclusively comprises publicly available Python Jupyter notebooks from Kaggle. The essential information for retrieving the data needed to download the dataset is obtained from the MetaKaggle dataset provided by Kaggle.
The DistilKaggle dataset consists of three main CSV files:
code.csv: Contains over 12 million rows of code cells extracted from the Kaggle kernels. Each row is identified by the kernel's ID and cell index for reproducibility.
markdown.csv: Includes over 5 million rows of markdown cells extracted from Kaggle kernels. Similar to code.csv, each row is identified by the kernel's ID and cell index.
notebook_metrics.csv: This file provides notebook features described in the accompanying paper released with this dataset. It includes metrics for over 517,000 Python notebooks.
The kernels directory is organized based on Kaggle's Performance Tiers (PTs), a ranking system in Kaggle that classifies users. The structure includes PT-specific directories, each containing user ids that belong to this PT, download logs, and the essential data needed for downloading the notebooks.
The utility directory contains two important files:
aggregate_data.py: A Python script for aggregating data from different PTs into the mentioned CSV files.
application.ipynb: A Jupyter notebook serving as a simple example application using the metrics dataframe. It demonstrates predicting the PT of the author based on notebook metrics.
DistilKaggle.tar.gz: It is just the compressed version of the whole dataset. If you downloaded all of the other files independently already, there is no need to download this file.
Researchers can leverage this distilled dataset for various analyses without dealing with the bulk of the original 300GB dataset. For access to the raw, unprocessed Kaggle kernels, researchers can request the dataset directly.
The original dataset of Kaggle kernels is substantial, exceeding 300GB, making it impractical for direct upload to Zenodo. Researchers interested in the full dataset can contact the dataset maintainers for access.
If you use this dataset in your research, please cite the accompanying paper or provide appropriate acknowledgment as outlined in the documentation.
If you have any questions regarding the dataset, don't hesitate to contact me at mohammad.abolnejadian@gmail.com
Thank you for using DistilKaggle!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Vertical_bar Kaggle is a dataset for object detection tasks - it contains Bar Plot_bb annotations for 2,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Resistors Kaggle is a dataset for object detection tasks - it contains Resistor annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Dataset By Kaggle is a dataset for object detection tasks - it contains LP annotations for 1,132 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Car Damages Kaggle is a dataset for instance segmentation tasks - it contains Car Damages annotations for 814 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Personality Dataset
Essays https://huggingface.co/datasets/jingjietan/essays-big5 MBTI https://huggingface.co/datasets/jingjietan/kaggle-mbti Pandora https://huggingface.co/datasets/jingjietan/pandora-big5 Please contact jingjietan.com for another dataset. Cite: @software{jingjietan-apr-dataset, author = {Jing Jie, Tan}, title = {Personality Kaggle Dataset Splitting}, url = {https://huggingface.co/datasets/jingjietan/kaggle-mbti}, version = {1.0.0}, year = {2024} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Gun Kaggle is a dataset for object detection tasks - it contains Gun Danger annotations for 2,988 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Mask Kaggle is a dataset for object detection tasks - it contains Mask Withoutmask annotations for 905 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
MTB Bacilli Detection Kaggle is a dataset for object detection tasks - it contains AFB annotations for 221 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Yolov5 Kaggle is a dataset for object detection tasks - it contains Planes annotations for 500 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Maxim Kosarev
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Major Project Kaggle is a dataset for object detection tasks - it contains Elephant Deer Buffalo Rhino annotations for 200 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Face Mask Detection Kaggle is a dataset for object detection tasks - it contains Masks annotations for 848 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is made for training the classifiers to detect whether some text is making a suggestion or not. This could be useful to train a classifier which can automatically scan the data for suggestions such as whether something should be done or not. For example, if someone says "I think you should invest in this coin", then it is registered as a suggestion. Very useful for data mining and decision making based on the NLP data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Obj Detect Kaggle is a dataset for object detection tasks - it contains Face Masks annotations for 848 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Kaggle Dataset Jai Mata Di is a dataset for object detection tasks - it contains LP annotations for 1,132 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Iranian Plate From Kaggle is a dataset for object detection tasks - it contains Plate REgl annotations for 433 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Road Crack Kaggle is a dataset for object detection tasks - it contains Crack IQK6 annotations for 3,643 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset represents the data related to food recommender system. Two datasets are included in this dataset file. First includes the dataset related to the foods, ingredients, cuisines involved. Second, includes the dataset of the rating system for the recommendation system.