61 datasets found

issues-kaggle-notebooks
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face Smol Models Research, issues-kaggle-notebooks [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks
Explore at:
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face Smol Models Research
Description
GitHub Issues & Kaggle Notebooks

Description

GitHub Issues & Kaggle Notebooks is a collection of two code datasets intended for language models training, they are sourced from GitHub issues and notebooks in Kaggle platform. These datasets are a modified part of the StarCoder2 model training corpus, precisely the bigcode/StarCoder2-Extras dataset. We reformat the samples to remove StarCoder2's special tokens and use natural text to delimit comments in issues and display… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks.
h
kaggle-recipe-categorized-chunk-8
huggingface.co
Updated Sep 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeff Schmitz (2024). kaggle-recipe-categorized-chunk-8 [Dataset]. https://huggingface.co/datasets/Schmitz005/kaggle-recipe-categorized-chunk-8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 11, 2024
Authors
Jeff Schmitz
Description
Schmitz005/kaggle-recipe-categorized-chunk-8 dataset hosted on Hugging Face and contributed by the HF Datasets community
R
Damaged Roads Alvaro Basily Kaggle Dataset
universe.roboflow.com
zip
Updated Dec 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Final Project (2022). Damaged Roads Alvaro Basily Kaggle Dataset [Dataset]. https://universe.roboflow.com/final-project-vs0cw/damaged-roads-alvaro-basily-kaggle
Explore at:
zipAvailable download formats
Dataset updated
Dec 10, 2022
Dataset authored and provided by
Final Project
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Damaged Roads Bounding Boxes
Description
Damaged Roads Alvaro Basily Kaggle

## Overview Damaged Roads Alvaro Basily Kaggle is a dataset for object detection tasks - it contains Damaged Roads annotations for 3,321 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Meta Kaggle Code
kaggle.com
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
Dataset updated
Jun 5, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Oracle Database metrics
kaggle.com
Updated Aug 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timerkhanov Yuriy (2020). Oracle Database metrics [Dataset]. https://www.kaggle.com/datasets/timerkhanovyuriy/oracle-database-metrics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Timerkhanov Yuriy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Timerkhanov Yuriy

Released under CC0: Public Domain

Contents
Airport Luggage Dataset
universe.roboflow.com
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow Madi (2023). Airport Luggage Dataset [Dataset]. https://universe.roboflow.com/roboflow-madi/airport-luggage
Explore at:
Dataset updated
Jan 22, 2023
Dataset provided by
Roboflow
Authors
Roboflow Madi
Variables measured
Luggage Bounding Boxes
Description
Some images were collected from Kaggle: https://www.kaggle.com/datasets/dataclusterlabs/suitcaseluggage-dataset

More images were also collected from the following Roboflow Universe projects: * https://universe.roboflow.com/ali-ahmad-kyfzj/baggage-rvbtb * https://universe.roboflow.com/luggage-7rqr6/luggage-kcuiy *****
Kaggle methane laser meaurement: array animation
ecat.ga.gov.au
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2016). Kaggle methane laser meaurement: array animation [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/29e3e457-cd26-7979-e053-10a3070a8952
Explore at:
Dataset updated
Jan 1, 2016
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Area covered
Pacific Ocean, North Pacific Ocean
Description
Animation for Kaggle showing a plume moving across an array of methane laser measurement paths
Kaggle methane laser measurements - fan animation
ecat.ga.gov.au
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2016). Kaggle methane laser measurements - fan animation [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/29e3e457-cd27-7979-e053-10a3070a8952
Explore at:
Dataset updated
Jan 1, 2016
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Description
Animation for Kaggle showing laser path measurements of methane over a plume of methane gas. Reflectors arranged in a fan configuration.
Fraud Detection - Financial transactions
find.data.gov.scot
csv
Updated Mar 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deloitte Datathon 2018 (uSmart) (2018). Fraud Detection - Financial transactions [Dataset]. https://find.data.gov.scot/datasets/39167
Explore at:
csv(470.6714 MB)Available download formats
Dataset updated
Mar 14, 2018
Dataset provided by
Deloittehttps://deloitte.com/
Description
Synthetic transactional data with labels for fraud detection. For more information, see: https://www.kaggle.com/ntnu-testimon/paysim1/version/2
Data from: San Francisco Open Data
kaggle.com
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataSF (2019). San Francisco Open Data [Dataset]. https://www.kaggle.com/datasets/datasf/san-francisco
Explore at:
Dataset updated
Mar 20, 2019
Dataset authored and provided by
DataSF
Description
Context

DataSF seeks to transform the way that the City of San Francisco works -- through the use of data.

https://datasf.org/about/

Content

This dataset contains the following tables: ['311_service_requests', 'bikeshare_stations', 'bikeshare_status', 'bikeshare_trips', 'film_locations', 'sffd_service_calls', 'sfpd_incidents', 'street_trees']

This data includes all San Francisco 311 service requests from July 2008 to the present, and is updated daily. 311 is a non-emergency number that provides access to non-emergency municipal services.

This data includes fire unit responses to calls from April 2000 to present and is updated daily. Data contains the call number, incident number, address, unit identifier, call type, and disposition. Relevant time intervals are also included. Because this dataset is based on responses, and most calls involved multiple fire units, there are multiple records for each call number. Addresses are associated with a block number, intersection or call box.

This data includes incidents from the San Francisco Police Department (SFPD) Crime Incident Reporting system, from January 2003 until the present (2 weeks ago from current date). The dataset is updated daily. Please note: the SFPD has implemented a new system for tracking crime. This dataset is still sourced from the old system, which is in the process of being retired (a multi-year process).

This data includes a list of San Francisco Department of Public Works maintained street trees including: planting date, species, and location. Data includes 1955 to present.

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

http://datasf.org/

https://cloud.google.com/bigquery/public-data/sfo-311

https://cloud.google.com/bigquery/public-data/sffd-service-calls

https://cloud.google.com/bigquery/public-data/sfpd-reports

https://cloud.google.com/bigquery/public-data/sfo-trees

Dataset Source: SF OpenData. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://sfgov.org/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @meric from Unplash.

Inspiration

Which neighborhoods have the highest proportion of offensive graffiti?

Which complaint is most likely to be made using Twitter and in which neighborhood?

What are the most complained about Muni stops in San Francisco?

What are the top 10 incident types that the San Francisco Fire Department responds to?

How many medical incidents and structure fires are there in each neighborhood?

What’s the average response time for each type of dispatched vehicle?

Which category of police incidents have historically been the most common in San Francisco?

What were the most common police incidents in the category of LARCENY/THEFT in 2016?

Which non-criminal incidents saw the biggest reporting change from 2015 to 2016?

What is the average tree diameter?

What is the highest number of a particular species of tree planted in a single year?

Which San Francisco locations feature the largest number of trees?
Gender Dataset
universe.roboflow.com
zip
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seeed Studio (2023). Gender Dataset [Dataset]. https://universe.roboflow.com/seeed-studio-e2fso/gender-8vbxd/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Sep 22, 2023
Dataset authored and provided by
Seeed Studio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Female Male
Description
fork from https://www.kaggle.com/datasets/ashishjangra27/gender-recognition-200k-images-celeba
docornot
huggingface.co
Updated May 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mozilla (2024). docornot [Dataset]. https://huggingface.co/datasets/Mozilla/docornot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 5, 2024
Dataset provided by
Mozillahttp://mozilla.org/
Authors
mozilla
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
The DocOrNot dataset contains 50% of images that are pictures, and 50% that are documents. It was built using 8k images from each one of these sources:

RVL CDIP (Small) - https://www.kaggle.com/datasets/uditamin/rvl-cdip-small - license: https://www.industrydocuments.ucsf.edu/help/copyright/ Flickr8k - https://www.kaggle.com/datasets/adityajn105/flickr8k - license: https://creativecommons.org/publicdomain/zero/1.0/

It can be used to train a model and classify an image as being a picture or a… See the full description on the dataset page: https://huggingface.co/datasets/Mozilla/docornot.
RSNA Intracranial Hemorrhage Detection
registry.opendata.aws
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radiological Society of North America (https://www.rsna.org/) (2024). RSNA Intracranial Hemorrhage Detection [Dataset]. https://registry.opendata.aws/rsna-intracranial-hemorrhage-detection/
Explore at:
Dataset updated
Aug 1, 2024
Dataset provided by
Radiological Society of North America
Description
RSNA assembled this dataset in 2019 for the RSNA Intracranial Hemorrhage Detection AI Challenge (https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/). De-identified head CT studies were provided by four research institutions. A group of over 60 volunteer expert radiologists recruited by RSNA and the American Society of Neuroradiology labeled over 25,000 exams for the presence and subtype classification of acute intracranial hemorrhage.
potholes, cracks and openmanholes (Road Hazards)
kaggle.com
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sabid Rahman (2025). potholes, cracks and openmanholes (Road Hazards) [Dataset]. http://doi.org/10.34740/kaggle/dsv/10834063
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10834063
Dataset updated
Feb 23, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sabid Rahman
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F23345571%2F4471e4ade50676d782d4787f77aa08ad%2F1000_F_256252609_6WIHRGbpzSaVQwioubxwgXdSJTNONNcK.jpg?generation=1739209341333909&alt=media" alt="">

This dataset contains 2,700 images focused on detecting potholes, cracks, and open manholes on roads. It has been augmented to enhance the variety and robustness of the data. The images are organized into training and validation sets, with three distinct categories:

Potholes: class 0

Cracks: class 1

Open Manholes: class 2

Included in the Dataset: - Bounding Box Annotations in YOLO Format (.txt files) - Format: YOLOv8 & YOLO11 compatible - Purpose: Ready for training YOLO-based object detection models

Folder Structure Organized into:

train/ folder

valid/ folder

Class-specific folders

An all_classes/ folder for combined access Benefit: Easy access for training, validation, and augmentation tasks

Dual Format Support

COCO JSON Annotations Included -Compatible with models like Faster R-CNN Enables flexibility across different object detection frameworks

Use Cases Targeted

Model training

Model testing

Custom data augmentation

Specific focus: Road safety and infrastructure detection

Here's a clear breakdown of the folder structure:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F23345571%2F023b40c98bf858c58394d6ed2393bfc3%2FScreenshot%202025-05-01%20202438.png?generation=1746109541780835&alt=media" alt="">
Car Crash or Collision Prediction Dataset
kaggle.com
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Fahim Bin Amin (2024). Car Crash or Collision Prediction Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/9268756
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9268756
Dataset updated
Aug 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md. Fahim Bin Amin
Description
Car Crash or Collision Prediction Dataset (Use ONLY the Compressed folder)

Source and Description

All of the images are from 100K Dashcam videos. It is collected from the BDD100K dataset.

The images are separated from the videos within 5-second intervals as individual frames.

Data Count

This dataset contains 10,000 images.

The annotation has been provided in the xlsx file as well.

Classes

The dataset contains 2 classes. They are given below:

Class Representation Class
y Collision/Accident
n No Collision/No Accident
Health Nutrition and Population Statistics
datacatalog1.worldbank.org
kaggle.com
databank, utf-8
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HealthStats, World Bank Group, Health Nutrition and Population Statistics [Dataset]. https://datacatalog1.worldbank.org/search/dataset/0037652/Health-Nutrition-and-Population-Statistics
Explore at:
utf-8, databankAvailable download formats
Dataset provided by
World Bank Grouphttp://www.worldbank.org/
World Bankhttp://worldbank.org/
License
https://datacatalog1.worldbank.org/public-licenses?fragment=cchttps://datacatalog1.worldbank.org/public-licenses?fragment=cc
Description
Health Nutrition and Population Statistics database provides key health, nutrition and population statistics gathered from a variety of international and national sources. Themes include global surgery, health financing, HIV/AIDS, immunization, infectious diseases, medical resources and usage, noncommunicable diseases, nutrition, population dynamics, reproductive health, universal health coverage, and water and sanitation.
Loan Prediction with 3 Problem Statement
kaggle.com
Updated Sep 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yashpal (2022). Loan Prediction with 3 Problem Statement [Dataset]. https://www.kaggle.com/datasets/yashpaloswal/loan-prediction-with-3-problem-statement
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yashpal
Description
The data contains client loan data and whether there loan got approved or not. The main goal is to find out loan approval prediction over testing data using model (created using training data)
Kenya - Food Prices
data.wu.ac.at
cloud.csiss.gmu.edu
csv
Updated Oct 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WFP - World Food Programme (2018). Kenya - Food Prices [Dataset]. https://data.wu.ac.at/schema/data_humdata_org/ZTBkM2ZiYTYtZjlhMi00NWQ3LWI5NDktMTQwYzQ1NTE5N2Zm
Explore at:
csv(523604.0), csv(126113.0)Available download formats
Dataset updated
Oct 4, 2018
Dataset provided by
World Food Programmehttp://da.wfp.org/
Description
This dataset contains Food Prices data for Kenya. Food prices data comes from the World Food Programme and covers foods such as maize, rice, beans, fish, and sugar for 76 countries and some 1,500 markets. It is updated weekly but contains to a large extent monthly data. The data goes back as far as 1992 for a few countries, although many countries started reporting from 2003 or thereafter.

Class Representation	Class
y	Collision/Accident
n	No Collision/No Accident

🛍️ Fashion Retail Sales Dataset

kaggle.com

Updated Apr 1, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Atharva Soundankar (2025). 🛍️ Fashion Retail Sales Dataset [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/fashion-retail-sales

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 1, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Atharva Soundankar

Description

📜 Dataset Overview

This dataset contains 3,400 records of fashion retail sales, capturing various details about customer purchases, including item details, purchase amounts, ratings, and payment methods. It is useful for analyzing customer buying behavior, product popularity, and payment preferences.

📂 Dataset Details

Column Name	Data Type	Non-Null Count	Description
`Customer Reference ID`	Integer	3,400	A unique identifier for each customer.
`Item Purchased`	String	3,400	The name of the fashion item purchased.
`Purchase Amount (USD)`	Float	2,750	The purchase price of the item in USD (650 missing values).
`Date Purchase`	String	3,400	The date on which the purchase was made (format: DD-MM-YYYY).
`Review Rating`	Float	3,076	The customer review rating (scale: 1 to 5, 324 missing values).
`Payment Method`	String	3,400	The payment method used (e.g., Credit Card, Cash).

🔍 Key Insights

The dataset contains 3,400 transactions.
Missing values are present in:
- Purchase Amount (USD): 650 missing values
- Review Rating: 324 missing values
Payment Method includes multiple categories, allowing analysis of payment trends.
Date Purchase is in DD-MM-YYYY format, which can be useful for time-series analysis.
The dataset can help analyze sales trends, customer preferences, and payment behaviors in the fashion retail industry.

📊 Potential Use Cases

Sales Analysis: Understanding which fashion items are selling the most.
Customer Insights: Analyzing purchase behaviors and spending patterns.
Trend Forecasting: Identifying seasonal trends in fashion retail.
Payment Method Preferences: Understanding how customers prefer to pay.

Pakistan - Food Prices
data.wu.ac.at
data.humdata.org
+2more
csv
Updated Oct 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WFP - World Food Programme (2018). Pakistan - Food Prices [Dataset]. https://data.wu.ac.at/schema/data_humdata_org/MTI1NDU1ZmYtYzhhOC00ZjRiLTkxOTAtYmYwZTg0NTU5ZGM3
Explore at:
csv(171690.0), csv(709048.0)Available download formats
Dataset updated
Oct 4, 2018
Dataset provided by
World Food Programmehttp://da.wfp.org/
Description
This dataset contains Food Prices data for Pakistan. Food prices data comes from the World Food Programme and covers foods such as maize, rice, beans, fish, and sugar for 76 countries and some 1,500 markets. It is updated weekly but contains to a large extent monthly data. The data goes back as far as 1992 for a few countries, although many countries started reporting from 2003 or thereafter.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hugging Face Smol Models Research, issues-kaggle-notebooks [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks

issues-kaggle-notebooks

HuggingFaceTB/issues-kaggle-notebooks

Explore at:

Dataset provided by

Hugging Facehttps://huggingface.co/

Authors

Hugging Face Smol Models Research

Description

GitHub Issues & Kaggle Notebooks

  Description

GitHub Issues & Kaggle Notebooks is a collection of two code datasets intended for language models training, they are sourced from GitHub issues and notebooks in Kaggle platform. These datasets are a modified part of the StarCoder2 model training corpus, precisely the bigcode/StarCoder2-Extras dataset. We reformat the samples to remove StarCoder2's special tokens and use natural text to delimit comments in issues and display… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks.

Clear search

Close search

Google apps

Main menu

issues-kaggle-notebooks

kaggle-recipe-categorized-chunk-8

Damaged Roads Alvaro Basily Kaggle Dataset

Damaged Roads Alvaro Basily Kaggle

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Oracle Database metrics

Dataset

Contents

Airport Luggage Dataset

Kaggle methane laser meaurement: array animation

Kaggle methane laser measurements - fan animation

Fraud Detection - Financial transactions

Data from: San Francisco Open Data

Context

Content

Acknowledgements

Inspiration

Gender Dataset

docornot

RSNA Intracranial Hemorrhage Detection

potholes, cracks and openmanholes (Road Hazards)

Car Crash or Collision Prediction Dataset

Car Crash or Collision Prediction Dataset (Use ONLY the Compressed folder)

Source and Description

Data Count

Classes

Health Nutrition and Population Statistics

Loan Prediction with 3 Problem Statement

Kenya - Food Prices

🛍️ Fashion Retail Sales Dataset

📜 Dataset Overview

📂 Dataset Details

🔍 Key Insights

📊 Potential Use Cases

Pakistan - Food Prices

issues-kaggle-notebooks

HuggingFaceTB/issues-kaggle-notebooks

Car Crash or Collision Prediction Dataset (Use ONLY the `Compressed` folder)