100+ datasets found

Top 2500 Kaggle Datasets
kaggle.com
Updated Feb 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7637365
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.
interior_design
kaggle.com
zip
Updated Aug 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aishah Sofea (2020). interior_design [Dataset]. https://www.kaggle.com/datasets/aishahsofea/interior-design
Explore at:
zip(141879955 bytes)Available download formats
Dataset updated
Aug 5, 2020
Authors
Aishah Sofea
Description
Dataset

This dataset was created by Aishah Sofea

Released under Data files © Original Authors

Contents
Meta Kaggle Code
kaggle.com
zip
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(167219625372 bytes)Available download formats
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Top 1000 Kaggle Datasets
kaggle.com
zip
Updated Jan 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
Explore at:
zip(34269 bytes)Available download formats
Dataset updated
Jan 3, 2022
Authors
Trrishan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle
Synthetic dataset for home interior
kaggle.com
zip
Updated Jun 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coohom CLoud (2022). Synthetic dataset for home interior [Dataset]. https://www.kaggle.com/datasets/luznoc/synthetic-dataset-for-home-interior
Explore at:
zip(674141269 bytes)Available download formats
Dataset updated
Jun 9, 2022
Authors
Coohom CLoud
Description
This dataset showcases the diversity of labeled synthetic data you can generate with our tools to accelerate your computer vision projects. It includes: 85 synthetic RGB images as well as annotated versions with instance and semantic segmentation

We have massive indoor scene datasets and all for free.Visit our website for details.Or get in touch with our team and we can build one tailored to your specific requirements. xinxuan@qunhemail.com
Kaggle Top Datasets🚀📊
kaggle.com
zip
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Frias (2024). Kaggle Top Datasets🚀📊 [Dataset]. https://www.kaggle.com/datasets/aaronfriasr/kaggle-top-datasets
Explore at:
zip(1572305 bytes)Available download formats
Dataset updated
Apr 10, 2024
Authors
Aaron Frias
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning

Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset

Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.

Enjoy & Keep Learning !!!
Clean Meta Kaggle
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoni Kremer (2023). Clean Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/yonikremer/clean-meta-kaggle
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yoni Kremer
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Cleaned Meta-Kaggle Dataset

The Original Dataset - Meta-Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

https://i.imgur.com/2Egeb8R.png" alt="" title="a title">

This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

August 2023 update

In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here

We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

The Problems with the Original Dataset

The original dataset is 32 CSV files, with 268 colums and 7GB of compressed data. Having so many tables and columns makes it hard to understand the data.

The data is not normalized, so when you join tables you get a lot of errors.

Some values refer to non-existing values in other tables. For example, the UserId column in the ForumMessages table has values that do not exist in the Users table.

There are missing values.

There are duplicate values.

There are values that are not valid. For example, Ids that are not positive integers.

The date and time columns are not in the right format.

Some columns only have the same value for all rows, so they are not useful.

The boolean columns have string values True or False.

Incorrect values for the Total columns. For example, the DatasetCount is not the total number of datasets with the Tag according to the DatasetTags table.

Users upvote their own messages.

The Solution

To handle so many tables and columns I use a relational database. I use MySQL, but you can use any relational database.

The steps to create the database are:

Creating the database tables with the right data types and constraints. I do that by running the db_abd_create_tables.sql script.

Downloading the CSV files from Kaggle using the Kaggle API.

Cleaning the data using pandas. I do that by running the clean_data.py script. The script does the following steps for each table:

Drops the columns that are not needed.

Converts each column to the right data type.

Replaces foreign keys that do not exist with NULL.

Replaces some of the missing values with default values.

Removes rows where there are missing values in the primary key/not null columns.

Removes duplicate rows.

Loading the data into the database using the LOAD DATA INFILE command.

Checks that the number of rows in the database tables is the same as the number of rows in the CSV files.

Adds foreign key constraints to the database tables. I do that by running the add_foreign_keys.sql script.

Update the Total columns in the database tables. I do that by running the update_totals.sql script.

Backup the database.
Coal Miners Detection
kaggle.com
zip
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2023). Coal Miners Detection [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/miners-detection
Explore at:
zip(5795006 bytes)Available download formats
Dataset updated
Sep 18, 2023
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Miners Object Detection dataset

The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fdb3f193275f5206914a19b127e20138e%2FFrame%2013.png?generation=1695040375509674&alt=media" alt="">

Get the Dataset

This is just an example of the data

Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

Dataset structure

images - contains of original images of miners

boxes - includes bounding box labeling for the original images

annotations.xml - contains coordinates of the bounding boxes and labels, created for the original photo

Data Format

Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes for miners detection. For each point, the x and y coordinates are provided. The position of the miner is also provided by the attribute is_sitting (true, false).

Example of XML file structure

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Febb59bc7d91a28f4e10c3f3da4ce4488%2Fcarbon%20(1).png?generation=1695040600108833&alt=media" alt="">

Miners detection might be made in accordance with your requirements.

🧩 This is just an example of the data. Leave a request here to learn more

🚀 You can learn more about our high-quality unique datasets here

keywords: coal mines, underground, safety monitoring system, safety dataset, manufacturing dataset, industrial safety database, health and safety dataset, quality control dataset, quality assurance dataset, annotations dataset, computer vision dataset, image dataset, object detection, human images, classification
Images in CSV datasets
kaggle.com
zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
Explore at:
zip(347504240 bytes)Available download formats
Dataset updated
Oct 14, 2024
Authors
Pascal
Description
Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

"mnist_big.csv"

Reconnaissance d'images de chiffres manuscrits

Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

"sign_mnist_big.csv"

Reconnaissance d'images de gestes de la langue des signes

Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

"zalando_small.csv"

Reconnaissance de vêtements et chaussures (Zalando)

Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

"hmnist_8_8_RGB.csv"

Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

Autres versions avec des images plus petites et/ou en niveaux de gris

Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

"cifar10_small.csv"

Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv
Interior Design Images & Metadata
kaggle.com
zip
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GalinaKG (2025). Interior Design Images & Metadata [Dataset]. https://www.kaggle.com/datasets/galinakg/interior-design-images-and-metadata
Explore at:
zip(68150286 bytes)Available download formats
Dataset updated
Feb 26, 2025
Authors
GalinaKG
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains a curated collection of interior design images categorized by room type and design style. The images are sourced from Pinterest and labeled with relevant metadata for machine learning applications, including image classification, style prediction, and aesthetic analysis.

Dataset Structure

The dataset is organized into directories based on room types:

bathroom/

bedroom/

kitchen/

living_room/

Each room type further contains subdirectories for different design styles, such as:

boho

industrial

minimalist

modern

scandinavian

Files Included

metadata.csv → Contains file paths and labels for room type and design style.

train_data.csv → Training split of the dataset.

val_data.csv → Validation split of the dataset.

test_data.csv → Test split for evaluation.

Metadata Format

Each row in metadata.csv contains:

image_path: Relative path to the image.

room_type: The category of the room (e.g., bathroom, bedroom).

style: The interior design style (e.g., boho, modern).
Interior design styles
kaggle.com
zip
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lmaMater (2025). Interior design styles [Dataset]. https://www.kaggle.com/datasets/stepanyarullin/interior-design-styles
Explore at:
zip(732876516 bytes)Available download formats
Dataset updated
Feb 11, 2025
Authors
lmaMater
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Interior Design Styles Dataset

A collection of 18,6k interior design images (~1,000 per style) scraped from Houzz.com for ITMO University CV project. Ideal for CNN classification and image generation (GANs, VAEs). Already train/test splitted with ratio 80/20.

Features: 19 design styles (e.g., traditional, eclectic, rustic). Quite high-quality real-world interior images. Useful for style classification, feature extraction, and generative models. Applications: - Train a CNN to classify interior styles. - Use for GANs/VAEs to generate design ideas. - Apply style transfer techniques.

🚨 Note: Images scraped from Houzz.com — check copyright usage for public/commercial projects.
Kaggle: Forum Discussions
kaggle.com
zip
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolás Ariel González Muñoz (2025). Kaggle: Forum Discussions [Dataset]. https://www.kaggle.com/datasets/nicolasgonzalezmunoz/kaggle-forum-discussions
Explore at:
zip(542099 bytes)Available download formats
Dataset updated
Nov 8, 2025
Authors
Nicolás Ariel González Muñoz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.

Summary

Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.

This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.

Extraction Technique

As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.

The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.

Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.

If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.

Structure

This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.

The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.

By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.

Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.
House Rooms & Streets Image Dataset
kaggle.com
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Mazurov (2022). House Rooms & Streets Image Dataset [Dataset]. https://www.kaggle.com/datasets/mikhailma/house-rooms-streets-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mike Mazurov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 2 folders with images of different rooms of houses and street views. The house data consist few categories such as: bath, bed, din, kitchen and living. The street data consist few categories such as: apartment, church, garage, house, industrial, office building, retail and roofs.

I took pictures of rooms here and pictures of houses here, resized them to 224x224, removed Google watermarks and merged 2 datasets together.

In general, I used this data for my tasks, but decided that this data set might be useful to someone else, if so feel free to upvote me 🤗
Real Time Anomaly Detection in CCTV Surveillance
kaggle.com
zip
Updated Dec 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
webadvisor (2022). Real Time Anomaly Detection in CCTV Surveillance [Dataset]. https://www.kaggle.com/datasets/webadvisor/real-time-anomaly-detection-in-cctv-surveillance
Explore at:
zip(102007775226 bytes)Available download formats
Dataset updated
Dec 25, 2022
Authors
webadvisor
Description
UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.
Obstacles in Public Spaces for Dist-YOLO
kaggle.com
zip
Updated Jan 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mufti Restu Mahesa (2024). Obstacles in Public Spaces for Dist-YOLO [Dataset]. https://www.kaggle.com/datasets/muftirestumahesa/obstacles-in-public-spaces-for-dist-yolo
Explore at:
zip(274982559 bytes)Available download formats
Dataset updated
Jan 27, 2024
Authors
Mufti Restu Mahesa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The "Obstacles in Public Spaces for Dist-YOLO" dataset is a collection of data focusing on annotated images depicting various types of obstacles that can be encountered in public spaces. This dataset has been curated and annotated with the aim of supporting the development of the Dist-YOLO (You Only Look Once) model for object detection. The description of this dataset includes several key points: 1. Types of Obstacles: The dataset encompasses various types of obstacles that may be encountered in public spaces, such as Right Turn, Left Turn, Puddle, Street Vendor, Obstacle, Bad Road, Garbage Bin, Chair, Pothole, Car, Motorcycle, Pedestrian, Fence, Gate, Barrier, Roadblock, Door, Tree, Plant, Pot, Drain, Stair, Pole, and Zebra Cross. 2. Annotation Purpose: Each image in the dataset has been meticulously annotated to identify the location and type of obstacles present in the image. 3. Data Format: Data in the dataset is typically presented in image formats (e.g., JPG or PNG) that have been annotated with bounding boxes or markers to indicate the location of obstacles. 4. Dataset Size: The dataset can contain varying numbers of images, depending on research or model development needs. Total images in dataset is 3350 images. 5. Usage Requirement: This dataset is useful for training and testing Object Detection models, especially models like Dist-YOLO, in recognizing and classifying obstacles in public environments. 6. Application Fields: The dataset can be utilized in various application fields, including assistive technology for the visually impaired, development of navigation systems for the blind, autonomous vehicle development, and other applications involving object detection in public spaces. It is important to provide proper attribution and references to the dataset when used in research or projects, and to adhere to applicable guidelines and copyrights related to the dataset.
Bedroom interior Dataset
kaggle.com
zip
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant Singh (2024). Bedroom interior Dataset [Dataset]. https://www.kaggle.com/datasets/prashantsingh001/bedroom-interior-dataset
Explore at:
zip(1169092549 bytes)Available download formats
Dataset updated
Oct 30, 2024
Authors
Prashant Singh
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Bedroom Interior Design Images Dataset

Overview

This dataset comprises 1,800 high-quality images of bedroom interior designs, showcasing a variety of styles, layouts, and color schemes. It serves as a valuable resource for researchers, designers, and developers working in the fields of interior design, computer vision, and machine learning.

Dataset Details

Number of Images: 1,800 Image Format: JPG Dimensions: Varies (standard resolutions included) Content: The images feature diverse bedroom designs, including contemporary, modern, traditional, minimalist, and eclectic styles. Each image highlights different aspects of bedroom decor, such as furniture arrangement, color palettes, lighting, and accessory placements.

Use Cases

This dataset can be utilized for:

Training machine learning models for image classification and object detection in interior design.

Developing recommendation systems for interior design based on user preferences.

Conducting research in design trends and aesthetic evaluations.

Assisting designers in generating design ideas and inspirations.

Contribution

The dataset is curated from various sources, ensuring a wide range of design styles and elements. It aims to support creative projects and research endeavors in the field of interior design.
(🌅 Sunset) Kaggle Users' Country + Regions Info
kaggle.com
zip
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). (🌅 Sunset) Kaggle Users' Country + Regions Info [Dataset]. https://www.kaggle.com/datasets/bwandowando/kaggle-user-country-regions
Explore at:
zip(2376511 bytes)Available download formats
Dataset updated
Feb 14, 2024
Authors
BwandoWando
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
[Context]

The official Meta-Kaggle dataset contains the Users.csv file which contains Username, DisplayName, RegisterDate, and PerformanceTier fields but doesn't contain location data of the Kaggle Users. This dataset augments that data with additional country and region information.

[Note]

I haven't included the username and displayname values on purpose, just the userid to be joined back to the Meta-Kaggle official Users.csv file.

[Limitations]

It is possible that some users haven't inputted their details when the scraper went through their accounts and thus have missing data. Another possibility is that users may have updated their info after the scraper went through their accounts, thus resulting in inconsistencies.

[How I defined active in this dataset]

Users that have received an upvote in the forums, datasets, or notebooks

Users that have given an upvote in the forums, datasets, or notebooks

Users that have created a thread, a forum post, a notebook, or a dataset

Users that made a competition submission

Users that exist in the Meta-Kaggle Users dataset

Date cut-off of Jan 01, 2019

[Update]

15-Feb-2024- Since the Kaggle member's profile page update, the scrapers arent working anymore as the UI layout has changed. Will fix this when we get the time.
night-to-day
kaggle.com
zip
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kareem Ali (2024). night-to-day [Dataset]. https://www.kaggle.com/datasets/kareem00ali/night-to-day
Explore at:
zip(4147116835 bytes)Available download formats
Dataset updated
Feb 3, 2024
Authors
Kareem Ali
Description
About Dataset

It's a modified version of bdd100k dataset from kaggle. solesensei/solesensei_bdd100k

About the original dataset

100K Images The images in this package are the frames at the 10th second in the videos. The split of train, validation, and test sets are the same with the whole video set. They are used for object detection, drivable area, lane marking. - bdd100k - images - 100k - train - val - test

https://bair.berkeley.edu/blog/2018/05/30/bdd/ Licence

Modification

I omitted the test folder and all the images which aren't split in the 4 train, test subfolders. Then only kept images that are Day (testA, trainA) and Night (testB, trainB) after rearranging images which were in the wrong subfolder. That's nearly 73k images. the training_label csv contains the name of each of the images and whether it is [Day, Night, Dawn/Dusk, Undefined] from the kaggle dataset solesensei/solesensei_bdd100k

Reference [1] Huazhe Xu, Yang Gao, Fisher Yu, and Trevor Darrell. "End-to-end learning of driving models from large-scale video datasets." CVPR 2017 [2] Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen Liu, Mike Liao, Vashisht Madhavan, Trevor Darrell. "BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling" arXiv:1805.04687 [3] Ye Xia, Danqing Zhang, Jinkyu Kim, Ken Nakayama, Karl Zipser, David Whitney. "Predicting Driver Attention in Critical Situations" ACCV 2018
Human Tracking & Object Detection Dataset
kaggle.com
zip
Updated Jul 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unique Data (2023). Human Tracking & Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/people-tracking
Explore at:
zip(46156442 bytes)Available download formats
Dataset updated
Jul 27, 2023
Authors
Unique Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
People Tracking & Object Detection dataset

The dataset comprises of annotated video frames from positioned in a public space camera. The tracking of each individual in the camera's view has been achieved using the rectangle tool in the Computer Vision Annotation Tool (CVAT).

The dataset is created on the basis of Real-Time Traffic Video Dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fc5a8dc4f63fe85c64a5fead10fad3031%2Fpersons_gif.gif?generation=1690705558283123&alt=media" alt="">

Dataset Structure

The images directory houses the original video frames, serving as the primary source of raw data.

The annotations.xml file provides the detailed annotation data for the images.

The boxes directory contains frames that visually represent the bounding box annotations, showing the locations of the tracked individuals within each frame. These images can be used to understand how the tracking has been implemented and to visualize the marked areas for each individual.

Data Format

The annotations are represented as rectangle bounding boxes that are placed around each individual. Each bounding box annotation contains the position ( xtl-ytl-xbr-ybr coordinates ) for the respective box within the frame. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4f274551e10db2754c4d8a16dff97b33%2Fcarbon%20(10).png?generation=1687776281548084&alt=media" alt="">

👉 Legally sourced datasets and carefully structured for AI training and model development. Explore samples from our dataset of 95,000+ human images & videos - Full dataset

🚀 You can learn more about our high-quality unique datasets here

keywords: multiple people tracking, human detection dataset, object detection dataset, people tracking dataset, tracking human object interactions, human Identification tracking dataset, people detection annotations, detecting human in a crowd, human trafficking dataset, deep learning object tracking, multi-object tracking dataset, labeled web tracking dataset, large-scale object tracking dataset
Doors Image Dataset | Indoor Object Detection
kaggle.com
zip
Updated Feb 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataCluster Labs (2023). Doors Image Dataset | Indoor Object Detection [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/doors-doors
Explore at:
zip(556294883 bytes)Available download formats
Dataset updated
Feb 18, 2023
Authors
DataCluster Labs
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is collected by DataCluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

This dataset is an extremely challenging set of over 3,000+ images of excavator vehicles from multiple construction site. These images captured and crowdsourced from over 2000+ different locations, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs. It contains a wide variety of indoor door images. This dataset can be used scene classification and domestic object detection.

Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.

Dataset Features

Dataset size : 3000+ images

Captured by : Over 2000+ crowdsource contributors

Resolution : HD and above (1920x1080 and above)

Location : Captured with 2000+ locations

Diversity : Various lighting conditions like day, night, varied distances, view points etc.

Device used : Captured using mobile phones in 2020-2022

Usage : Image classification, domestic object detection, objects relationship understanding etc.

Available Annotation formats

COCO, YOLO, PASCAL-VOC, Tf-Record

The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.

Facebook

Twitter

Click to copy link

Link copied

Cite

Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/7637365

Dataset updated

Feb 16, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Saket Kumar

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

Clear search

Close search

Google apps

Main menu

Top 2500 Kaggle Datasets

interior_design

Dataset

Contents

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Top 1000 Kaggle Datasets

From wiki

Synthetic dataset for home interior

Kaggle Top Datasets🚀📊

Clean Meta Kaggle

Cleaned Meta-Kaggle Dataset

The Original Dataset - Meta-Kaggle

August 2023 update

The Problems with the Original Dataset

The Solution

Coal Miners Detection

Miners Object Detection dataset

💴 For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

Get the Dataset

This is just an example of the data

Dataset structure

Data Format

Example of XML file structure

Miners detection might be made in accordance with your requirements.

🧩 This is just an example of the data. Leave a request here to learn more

Images in CSV datasets

"mnist_big.csv"

"sign_mnist_big.csv"

"zalando_small.csv"

"hmnist_8_8_RGB.csv"

"cifar10_small.csv"

Interior Design Images & Metadata

Dataset Structure

Files Included

Metadata Format

Interior design styles

Kaggle: Forum Discussions

Summary

Extraction Technique

Structure

House Rooms & Streets Image Dataset

Real Time Anomaly Detection in CCTV Surveillance

Obstacles in Public Spaces for Dist-YOLO

Bedroom interior Dataset

Bedroom Interior Design Images Dataset

Overview

Dataset Details

Use Cases

Contribution

(🌅 Sunset) Kaggle Users' Country + Regions Info

[Context]

[Note]

[Limitations]

[How I defined active in this dataset]

[Update]

night-to-day

About Dataset

About the original dataset

Modification

Human Tracking & Object Detection Dataset

People Tracking & Object Detection dataset

The dataset is created on the basis of Real-Time Traffic Video Dataset

Dataset Structure

Data Format

👉 Legally sourced datasets and carefully structured for AI training and model development. Explore samples from our dataset of 95,000+ human images & videos - Full dataset

Doors Image Dataset | Indoor Object Detection

This dataset is collected by DataCluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

Dataset Features

Available Annotation formats

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips