100+ datasets found
  1. Top 2500 Kaggle Datasets

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saket Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

    Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

    Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

    Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

    Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

    Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

    Column Definitions:

    Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

  2. interior_design

    • kaggle.com
    zip
    Updated Aug 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aishah Sofea (2020). interior_design [Dataset]. https://www.kaggle.com/datasets/aishahsofea/interior-design
    Explore at:
    zip(141879955 bytes)Available download formats
    Dataset updated
    Aug 5, 2020
    Authors
    Aishah Sofea
    Description

    Dataset

    This dataset was created by Aishah Sofea

    Released under Data files © Original Authors

    Contents

  3. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(167219625372 bytes)Available download formats
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  4. Top 1000 Kaggle Datasets

    • kaggle.com
    zip
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
    Explore at:
    zip(34269 bytes)Available download formats
    Dataset updated
    Jan 3, 2022
    Authors
    Trrishan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

  5. Synthetic dataset for home interior

    • kaggle.com
    zip
    Updated Jun 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coohom CLoud (2022). Synthetic dataset for home interior [Dataset]. https://www.kaggle.com/datasets/luznoc/synthetic-dataset-for-home-interior
    Explore at:
    zip(674141269 bytes)Available download formats
    Dataset updated
    Jun 9, 2022
    Authors
    Coohom CLoud
    Description

    This dataset showcases the diversity of labeled synthetic data you can generate with our tools to accelerate your computer vision projects. It includes: 85 synthetic RGB images as well as annotated versions with instance and semantic segmentation

    We have massive indoor scene datasets and all for free.Visit our website for details.Or get in touch with our team and we can build one tailored to your specific requirements. xinxuan@qunhemail.com

  6. Kaggle Top Datasets🚀📊

    • kaggle.com
    zip
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Frias (2024). Kaggle Top Datasets🚀📊 [Dataset]. https://www.kaggle.com/datasets/aaronfriasr/kaggle-top-datasets
    Explore at:
    zip(1572305 bytes)Available download formats
    Dataset updated
    Apr 10, 2024
    Authors
    Aaron Frias
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning

    Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset

    Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.

    Enjoy & Keep Learning !!!

  7. Clean Meta Kaggle

    • kaggle.com
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoni Kremer (2023). Clean Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/yonikremer/clean-meta-kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yoni Kremer
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Cleaned Meta-Kaggle Dataset

    The Original Dataset - Meta-Kaggle

    Explore our public data on competitions, datasets, kernels (code / notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

    https://i.imgur.com/2Egeb8R.png" alt="" title="a title">

    This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

    Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

    August 2023 update

    In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here

    We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

    The Problems with the Original Dataset

    • The original dataset is 32 CSV files, with 268 colums and 7GB of compressed data. Having so many tables and columns makes it hard to understand the data.
    • The data is not normalized, so when you join tables you get a lot of errors.
    • Some values refer to non-existing values in other tables. For example, the UserId column in the ForumMessages table has values that do not exist in the Users table.
    • There are missing values.
    • There are duplicate values.
    • There are values that are not valid. For example, Ids that are not positive integers.
    • The date and time columns are not in the right format.
    • Some columns only have the same value for all rows, so they are not useful.
    • The boolean columns have string values True or False.
    • Incorrect values for the Total columns. For example, the DatasetCount is not the total number of datasets with the Tag according to the DatasetTags table.
    • Users upvote their own messages.

    The Solution

    • To handle so many tables and columns I use a relational database. I use MySQL, but you can use any relational database.
    • The steps to create the database are:
    • Creating the database tables with the right data types and constraints. I do that by running the db_abd_create_tables.sql script.
    • Downloading the CSV files from Kaggle using the Kaggle API.
    • Cleaning the data using pandas. I do that by running the clean_data.py script. The script does the following steps for each table:
      • Drops the columns that are not needed.
      • Converts each column to the right data type.
      • Replaces foreign keys that do not exist with NULL.
      • Replaces some of the missing values with default values.
      • Removes rows where there are missing values in the primary key/not null columns.
      • Removes duplicate rows.
    • Loading the data into the database using the LOAD DATA INFILE command.
    • Checks that the number of rows in the database tables is the same as the number of rows in the CSV files.
    • Adds foreign key constraints to the database tables. I do that by running the add_foreign_keys.sql script.
    • Update the Total columns in the database tables. I do that by running the update_totals.sql script.
    • Backup the database.
  8. Coal Miners Detection

    • kaggle.com
    zip
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). Coal Miners Detection [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/miners-detection
    Explore at:
    zip(5795006 bytes)Available download formats
    Dataset updated
    Sep 18, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Miners Object Detection dataset

    The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

    đź’´ For Commercial Usage: To discuss your requirements, learn about the price and buy the dataset, leave a request on our website to buy the dataset

    The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fdb3f193275f5206914a19b127e20138e%2FFrame%2013.png?generation=1695040375509674&alt=media" alt="">

    Get the Dataset

    This is just an example of the data

    Leave a request on https://trainingdata.pro/datasets to discuss your requirements, learn about the price and buy the dataset

    Dataset structure

    • images - contains of original images of miners
    • boxes - includes bounding box labeling for the original images
    • annotations.xml - contains coordinates of the bounding boxes and labels, created for the original photo

    Data Format

    Each image from images folder is accompanied by an XML-annotation in the annotations.xml file indicating the coordinates of the bounding boxes for miners detection. For each point, the x and y coordinates are provided. The position of the miner is also provided by the attribute is_sitting (true, false).

    Example of XML file structure

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Febb59bc7d91a28f4e10c3f3da4ce4488%2Fcarbon%20(1).png?generation=1695040600108833&alt=media" alt="">

    Miners detection might be made in accordance with your requirements.

    đź§© This is just an example of the data. Leave a request here to learn more

    🚀 You can learn more about our high-quality unique datasets here

    keywords: coal mines, underground, safety monitoring system, safety dataset, manufacturing dataset, industrial safety database, health and safety dataset, quality control dataset, quality assurance dataset, annotations dataset, computer vision dataset, image dataset, object detection, human images, classification

  9. Images in CSV datasets

    • kaggle.com
    zip
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pascal (2024). Images in CSV datasets [Dataset]. https://www.kaggle.com/datasets/pyim59/images-in-csv-datasets
    Explore at:
    zip(347504240 bytes)Available download formats
    Dataset updated
    Oct 14, 2024
    Authors
    Pascal
    Description

    Images sous forme de fichiers CSV pour une application de méthodes de machine learning "classiques" Ces datasets sont utilisés pour le cours de Centrale Lille sur le Machine Learning de Pascal Yim

    "mnist_big.csv"

    Reconnaissance d'images de chiffres manuscrits

    Version "mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/oddrationale/mnist-in-csv

    "sign_mnist_big.csv"

    Reconnaissance d'images de gestes de la langue des signes

    Version "sign_mnist_small.csv" avec moins de données pouvant servir aussi d'ensemble de test

    Source : https://www.kaggle.com/datasets/datamunge/sign-language-mnist

    "zalando_small.csv"

    Reconnaissance de vĂŞtements et chaussures (Zalando)

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "hmnist_8_8_RGB.csv"

    Reconnaissance de tumeurs de la peau (images en couleurs, trois valeurs R,G,B par pixel)

    Autres versions avec des images plus petites et/ou en niveaux de gris

    Source : https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000

    "cifar10_small.csv"

    Reconnaissance de petites images en couleurs dans 10 catégories Version en CSV du dataset CIFAR10

    Source : https://www.kaggle.com/datasets/fedesoriano/cifar10-python-in-csv?select=train.csv

  10. Interior Design Images & Metadata

    • kaggle.com
    zip
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GalinaKG (2025). Interior Design Images & Metadata [Dataset]. https://www.kaggle.com/datasets/galinakg/interior-design-images-and-metadata
    Explore at:
    zip(68150286 bytes)Available download formats
    Dataset updated
    Feb 26, 2025
    Authors
    GalinaKG
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a curated collection of interior design images categorized by room type and design style. The images are sourced from Pinterest and labeled with relevant metadata for machine learning applications, including image classification, style prediction, and aesthetic analysis.

    Dataset Structure

    The dataset is organized into directories based on room types:

    • bathroom/
    • bedroom/
    • kitchen/
    • living_room/

    Each room type further contains subdirectories for different design styles, such as:

    • boho
    • industrial
    • minimalist
    • modern
    • scandinavian

    Files Included

    • metadata.csv → Contains file paths and labels for room type and design style.
    • train_data.csv → Training split of the dataset.
    • val_data.csv → Validation split of the dataset.
    • test_data.csv → Test split for evaluation.

    Metadata Format

    Each row in metadata.csv contains:

    • image_path: Relative path to the image.
    • room_type: The category of the room (e.g., bathroom, bedroom).
    • style: The interior design style (e.g., boho, modern).
  11. Interior design styles

    • kaggle.com
    zip
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lmaMater (2025). Interior design styles [Dataset]. https://www.kaggle.com/datasets/stepanyarullin/interior-design-styles
    Explore at:
    zip(732876516 bytes)Available download formats
    Dataset updated
    Feb 11, 2025
    Authors
    lmaMater
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Interior Design Styles Dataset

    A collection of 18,6k interior design images (~1,000 per style) scraped from Houzz.com for ITMO University CV project. Ideal for CNN classification and image generation (GANs, VAEs). Already train/test splitted with ratio 80/20.

    Features: 19 design styles (e.g., traditional, eclectic, rustic). Quite high-quality real-world interior images. Useful for style classification, feature extraction, and generative models. Applications: - Train a CNN to classify interior styles. - Use for GANs/VAEs to generate design ideas. - Apply style transfer techniques.

    🚨 Note: Images scraped from Houzz.com — check copyright usage for public/commercial projects.

  12. Kaggle: Forum Discussions

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolás Ariel González Muñoz (2025). Kaggle: Forum Discussions [Dataset]. https://www.kaggle.com/datasets/nicolasgonzalezmunoz/kaggle-forum-discussions
    Explore at:
    zip(542099 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    Nicolás Ariel González Muñoz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.

    Summary

    Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.

    This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.

    Extraction Technique

    As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.

    The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.

    Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.

    If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.

    Structure

    This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.

    The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.

    By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.

    Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.

  13. House Rooms & Streets Image Dataset

    • kaggle.com
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Mazurov (2022). House Rooms & Streets Image Dataset [Dataset]. https://www.kaggle.com/datasets/mikhailma/house-rooms-streets-image-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mike Mazurov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 2 folders with images of different rooms of houses and street views. The house data consist few categories such as: bath, bed, din, kitchen and living. The street data consist few categories such as: apartment, church, garage, house, industrial, office building, retail and roofs.

    I took pictures of rooms here and pictures of houses here, resized them to 224x224, removed Google watermarks and merged 2 datasets together.

    In general, I used this data for my tasks, but decided that this data set might be useful to someone else, if so feel free to upvote me 🤗

  14. Real Time Anomaly Detection in CCTV Surveillance

    • kaggle.com
    zip
    Updated Dec 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    webadvisor (2022). Real Time Anomaly Detection in CCTV Surveillance [Dataset]. https://www.kaggle.com/datasets/webadvisor/real-time-anomaly-detection-in-cctv-surveillance
    Explore at:
    zip(102007775226 bytes)Available download formats
    Dataset updated
    Dec 25, 2022
    Authors
    webadvisor
    Description

    UCF Crime Dataset in the most suitable structure. Contains 1900 videos from 13 different categories. To ensure the quality of this dataset, it is trained ten annotators (having different levels of computer vision expertise) to collect the dataset. Using videos search on YouTube and LiveLeak using text search queries (with slight variations e.g. “car crash”, “road accident”) of each anomaly.

  15. Obstacles in Public Spaces for Dist-YOLO

    • kaggle.com
    zip
    Updated Jan 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mufti Restu Mahesa (2024). Obstacles in Public Spaces for Dist-YOLO [Dataset]. https://www.kaggle.com/datasets/muftirestumahesa/obstacles-in-public-spaces-for-dist-yolo
    Explore at:
    zip(274982559 bytes)Available download formats
    Dataset updated
    Jan 27, 2024
    Authors
    Mufti Restu Mahesa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The "Obstacles in Public Spaces for Dist-YOLO" dataset is a collection of data focusing on annotated images depicting various types of obstacles that can be encountered in public spaces. This dataset has been curated and annotated with the aim of supporting the development of the Dist-YOLO (You Only Look Once) model for object detection. The description of this dataset includes several key points: 1. Types of Obstacles: The dataset encompasses various types of obstacles that may be encountered in public spaces, such as Right Turn, Left Turn, Puddle, Street Vendor, Obstacle, Bad Road, Garbage Bin, Chair, Pothole, Car, Motorcycle, Pedestrian, Fence, Gate, Barrier, Roadblock, Door, Tree, Plant, Pot, Drain, Stair, Pole, and Zebra Cross. 2. Annotation Purpose: Each image in the dataset has been meticulously annotated to identify the location and type of obstacles present in the image. 3. Data Format: Data in the dataset is typically presented in image formats (e.g., JPG or PNG) that have been annotated with bounding boxes or markers to indicate the location of obstacles. 4. Dataset Size: The dataset can contain varying numbers of images, depending on research or model development needs. Total images in dataset is 3350 images. 5. Usage Requirement: This dataset is useful for training and testing Object Detection models, especially models like Dist-YOLO, in recognizing and classifying obstacles in public environments. 6. Application Fields: The dataset can be utilized in various application fields, including assistive technology for the visually impaired, development of navigation systems for the blind, autonomous vehicle development, and other applications involving object detection in public spaces. It is important to provide proper attribution and references to the dataset when used in research or projects, and to adhere to applicable guidelines and copyrights related to the dataset.

  16. Bedroom interior Dataset

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Singh (2024). Bedroom interior Dataset [Dataset]. https://www.kaggle.com/datasets/prashantsingh001/bedroom-interior-dataset
    Explore at:
    zip(1169092549 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Prashant Singh
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Bedroom Interior Design Images Dataset

    Overview

    This dataset comprises 1,800 high-quality images of bedroom interior designs, showcasing a variety of styles, layouts, and color schemes. It serves as a valuable resource for researchers, designers, and developers working in the fields of interior design, computer vision, and machine learning.

    Dataset Details

    Number of Images: 1,800 Image Format: JPG Dimensions: Varies (standard resolutions included) Content: The images feature diverse bedroom designs, including contemporary, modern, traditional, minimalist, and eclectic styles. Each image highlights different aspects of bedroom decor, such as furniture arrangement, color palettes, lighting, and accessory placements.

    Use Cases

    This dataset can be utilized for:

    • Training machine learning models for image classification and object detection in interior design.
    • Developing recommendation systems for interior design based on user preferences.
    • Conducting research in design trends and aesthetic evaluations.
    • Assisting designers in generating design ideas and inspirations.

    Contribution

    The dataset is curated from various sources, ensuring a wide range of design styles and elements. It aims to support creative projects and research endeavors in the field of interior design.

  17. (🌅 Sunset) Kaggle Users' Country + Regions Info

    • kaggle.com
    zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). (🌅 Sunset) Kaggle Users' Country + Regions Info [Dataset]. https://www.kaggle.com/datasets/bwandowando/kaggle-user-country-regions
    Explore at:
    zip(2376511 bytes)Available download formats
    Dataset updated
    Feb 14, 2024
    Authors
    BwandoWando
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    [Context]

    The official Meta-Kaggle dataset contains the Users.csv file which contains Username, DisplayName, RegisterDate, and PerformanceTier fields but doesn't contain location data of the Kaggle Users. This dataset augments that data with additional country and region information.

    [Note]

    I haven't included the username and displayname values on purpose, just the userid to be joined back to the Meta-Kaggle official Users.csv file.

    [Limitations]

    It is possible that some users haven't inputted their details when the scraper went through their accounts and thus have missing data. Another possibility is that users may have updated their info after the scraper went through their accounts, thus resulting in inconsistencies.

    [How I defined active in this dataset]

    • Users that have received an upvote in the forums, datasets, or notebooks
    • Users that have given an upvote in the forums, datasets, or notebooks
    • Users that have created a thread, a forum post, a notebook, or a dataset
    • Users that made a competition submission
    • Users that exist in the Meta-Kaggle Users dataset
    • Date cut-off of Jan 01, 2019

    [Update]

    • 15-Feb-2024- Since the Kaggle member's profile page update, the scrapers arent working anymore as the UI layout has changed. Will fix this when we get the time.
  18. night-to-day

    • kaggle.com
    zip
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kareem Ali (2024). night-to-day [Dataset]. https://www.kaggle.com/datasets/kareem00ali/night-to-day
    Explore at:
    zip(4147116835 bytes)Available download formats
    Dataset updated
    Feb 3, 2024
    Authors
    Kareem Ali
    Description

    About Dataset

    It's a modified version of bdd100k dataset from kaggle. solesensei/solesensei_bdd100k

    About the original dataset

    100K Images The images in this package are the frames at the 10th second in the videos. The split of train, validation, and test sets are the same with the whole video set. They are used for object detection, drivable area, lane marking. - bdd100k - images - 100k - train - val - test

    https://bair.berkeley.edu/blog/2018/05/30/bdd/ Licence

    Modification

    I omitted the test folder and all the images which aren't split in the 4 train, test subfolders. Then only kept images that are Day (testA, trainA) and Night (testB, trainB) after rearranging images which were in the wrong subfolder. That's nearly 73k images. the training_label csv contains the name of each of the images and whether it is [Day, Night, Dawn/Dusk, Undefined] from the kaggle dataset solesensei/solesensei_bdd100k

    Reference [1] Huazhe Xu, Yang Gao, Fisher Yu, and Trevor Darrell. "End-to-end learning of driving models from large-scale video datasets." CVPR 2017 [2] Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen Liu, Mike Liao, Vashisht Madhavan, Trevor Darrell. "BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling" arXiv:1805.04687 [3] Ye Xia, Danqing Zhang, Jinkyu Kim, Ken Nakayama, Karl Zipser, David Whitney. "Predicting Driver Attention in Critical Situations" ACCV 2018

  19. Human Tracking & Object Detection Dataset

    • kaggle.com
    zip
    Updated Jul 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unique Data (2023). Human Tracking & Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/trainingdatapro/people-tracking
    Explore at:
    zip(46156442 bytes)Available download formats
    Dataset updated
    Jul 27, 2023
    Authors
    Unique Data
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    People Tracking & Object Detection dataset

    The dataset comprises of annotated video frames from positioned in a public space camera. The tracking of each individual in the camera's view has been achieved using the rectangle tool in the Computer Vision Annotation Tool (CVAT).

    The dataset is created on the basis of Real-Time Traffic Video Dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2Fc5a8dc4f63fe85c64a5fead10fad3031%2Fpersons_gif.gif?generation=1690705558283123&alt=media" alt="">

    Dataset Structure

    • The images directory houses the original video frames, serving as the primary source of raw data.
    • The annotations.xml file provides the detailed annotation data for the images.
    • The boxes directory contains frames that visually represent the bounding box annotations, showing the locations of the tracked individuals within each frame. These images can be used to understand how the tracking has been implemented and to visualize the marked areas for each individual.

    Data Format

    The annotations are represented as rectangle bounding boxes that are placed around each individual. Each bounding box annotation contains the position ( xtl-ytl-xbr-ybr coordinates ) for the respective box within the frame. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F4f274551e10db2754c4d8a16dff97b33%2Fcarbon%20(10).png?generation=1687776281548084&alt=media" alt="">

    👉 Legally sourced datasets and carefully structured for AI training and model development. Explore samples from our dataset of 95,000+ human images & videos - Full dataset

    🚀 You can learn more about our high-quality unique datasets here

    keywords: multiple people tracking, human detection dataset, object detection dataset, people tracking dataset, tracking human object interactions, human Identification tracking dataset, people detection annotations, detecting human in a crowd, human trafficking dataset, deep learning object tracking, multi-object tracking dataset, labeled web tracking dataset, large-scale object tracking dataset

  20. Doors Image Dataset | Indoor Object Detection

    • kaggle.com
    zip
    Updated Feb 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataCluster Labs (2023). Doors Image Dataset | Indoor Object Detection [Dataset]. https://www.kaggle.com/datasets/dataclusterlabs/doors-doors
    Explore at:
    zip(556294883 bytes)Available download formats
    Dataset updated
    Feb 18, 2023
    Authors
    DataCluster Labs
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is collected by DataCluster Labs. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: sales@datacluster.ai

    This dataset is an extremely challenging set of over 3,000+ images of excavator vehicles from multiple construction site. These images captured and crowdsourced from over 2000+ different locations, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs. It contains a wide variety of indoor door images. This dataset can be used scene classification and domestic object detection.

    Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.

    Dataset Features

    • Dataset size : 3000+ images
    • Captured by : Over 2000+ crowdsource contributors
    • Resolution : HD and above (1920x1080 and above)
    • Location : Captured with 2000+ locations
    • Diversity : Various lighting conditions like day, night, varied distances, view points etc.
    • Device used : Captured using mobile phones in 2020-2022
    • Usage : Image classification, domestic object detection, objects relationship understanding etc.

    Available Annotation formats

    COCO, YOLO, PASCAL-VOC, Tf-Record

    The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
Organization logo

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

Search
Clear search
Close search
Google apps
Main menu