39 datasets found

Shells or Pebbles: An Image Classification Dataset
kaggle.com
Updated Aug 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marionette 👺 (2022). Shells or Pebbles: An Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/vencerlanz09/shells-or-pebbles-an-image-classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marionette 👺
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Overview

The dataset contains two classes: Shells or Pebbles. This dataset can be used to for binary classification tasks to determine whether a certain image constitutes as a shell or a pebble. Cover Image by wirestock on Freepik

Inspiration

I found it cool to create an app with a CV algorithm that could classify whether a certain picture is a shell or image. The next time that I would be visiting a beach, I could just use the app to help me collect either shells or pebbles. 😄
X-Ray and Non X-Ray Image Classification Data
kaggle.com
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madushani Rodrigo (2024). X-Ray and Non X-Ray Image Classification Data [Dataset]. https://www.kaggle.com/datasets/bmadushanirodrigo/x-ray-and-non-x-ray-image-classification-data/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Madushani Rodrigo
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is designed for X-ray and non-X-ray image classification tasks, specifically tailored for the identification of X-ray images. It includes a comprehensive collection of data split into training, validation, and test sets. Each set is organized into folders, with subfolders dedicated to X-ray and non-X-ray images respectively. This structured arrangement facilitates seamless training and evaluation of classification models aimed at distinguishing between X-ray and non-X-ray images.
IS THAT SANTA? (Image Classification)
kaggle.com
Updated Dec 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deep Contractor (2021). IS THAT SANTA? (Image Classification) [Dataset]. https://www.kaggle.com/deepcontractor/is-that-santa-image-classification/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 28, 2021
Dataset provided by
Kaggle
Authors
Deep Contractor
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Ho, Ho, Ho It's Christmas time so why not have a Christmas themed dataset to practice image classification.

Content

This is a binary classification image dataset where you can have to train a deep learning model which can predict if an image contains Santa Claus or not.

Inspiration

This was completely my idea, but upon googling it was found that pyimagesearch has a similar private dataset. However this data set is completely built and compiled by me :)
Arabic sign language binary image dataset
kaggle.com
Updated Aug 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Usman (2021). Arabic sign language binary image dataset [Dataset]. https://www.kaggle.com/vechoo/arabic-sign-language-binary-image-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Usman
Description
Dataset

This dataset was created by Muhammad Usman

Contents
Non and Biodegradable Material Dataset
kaggle.com
Updated Jun 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rayhan Zamzamy (2021). Non and Biodegradable Material Dataset [Dataset]. https://www.kaggle.com/rayhanzamzamy/non-and-biodegradable-waste-dataset/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rayhan Zamzamy
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Motivation

Waste. As we all know that waste has become commonplace in many countries in the world. There is nothing wrong, in terms of waste itself defined as the final product that can no longer be used (by humans); residue. The problem lies in 'how do we manage this waste, while we can't use it anymore?'. Several countries have issues related to waste because the rate of waste production is not comparable to its management efforts. These things can be a big problem for the ecosystem.

With this dataset. I hope we can help waste management efforts with computer vision technology. With this technology, we can identify, track, sort and process it accordingly.

Content

This dataset contains approximately 256K images (156K original data) representing two classes, Biodegradable and Non-biodegradable. - Biodegradable, contains materials which can be decomposed naturally by microorganisms, such as foods, plants, fruits, etc. The waste from this material can be processed into compost. - Non-biodegradable, contains materials that cannot be decomposed naturally, for example plastics, metals, inorganic elements, etc. The waste from this material will be recycled into new materials.

I add augmented data against imbalanced class. Augmented data made by manipulating original data. Image transformation used: horizontal flip, vertical flip, 90deg CW rotation, 90deg CCW rotation.

In this dataset, I divide the data into two subsets, training set and evaluation set. The training set itself was splitted into 4 parts due to some technical constraints (my internet bandwidth). The thing to know is that the part of the training set don't have a good data distribution. So, don't pass each part directly to your model. Concatenate each part to single dataset. See Quickstart.

Data files in this dataset have unique name to prevent them from overwritten theirself when concatenating. Below is filename reference. You will need this for filtering this dataset.

SUBSET.PART_CLASS_CATEGORY_ID.EXT

SUBSET, the subset where data belong in. Either TEST or TRAIN. PART, part number of subset. Only if the subset splitted into several parts. CLASS, the class of data. BIODEG for biodegradable, or NBIODEG for non-biodegradable. CATEGORY, category of data. ORI for original data, HFL for horizontal flip, VFL for vertical flip, CWR for clockwise rotation, CCW for counter clockwise rotation. ID, data identification number. EXT, data extension. Either .jpg or .jpeg.

Acknowledgements

In this part, i would like to give an attribution to several Kaggle's users because without their great work, this dataset would be incomplete. As i mentioned that this dataset's source consist of another Kaggle dataset. So, this is my responsibility to do this. - Food Images (Food-101) - (K Scott Mader) - Fruit and Vegetable Image Recognition - (Kritik Seth) - Waste Classification data - (Sashaank Sekar) - Waste Classification Data v2 - (sapal6) - waste_pictures - (且听风吟)
Covid-19 Dataset
kaggle.com
Updated Feb 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khuzaima Aziz (2022). Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/khuzaimaaziz/covid19-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Khuzaima Aziz
Description
Context

This is the cleaned version of the Covid-19 dataset that is found on kaggle.

Content

This dataset is already transformed into training, test and validation. you just need to download it and start coding. This is an image data that contains the x-ray images of chest.
Dataset for cats and dogs image classification
kaggle.com
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alif Rahman (2020). Dataset for cats and dogs image classification [Dataset]. https://www.kaggle.com/alifrahman/dataset-for-wbc-classification/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alif Rahman
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

This dataset is used for binary classification task using CNN

Content

This dataset contains the images of cats and dogs for the classification purpose using CNN.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Emotion(Happy) Detection Image DataSet
kaggle.com
Updated Jan 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maneesh (2021). Emotion(Happy) Detection Image DataSet [Dataset]. https://www.kaggle.com/maneesh99/emotionhappy-detection-image-dataset/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maneesh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If You find this useful please Upvote it.☝️

Context

A nearby community health clinic is helping the local residents monitor their mental health. As part of their study, they are asking volunteers to record their emotions throughout the day.
Beach Waste Detection Dataset
kaggle.com
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dev Sharma (2025). Beach Waste Detection Dataset [Dataset]. https://www.kaggle.com/datasets/devv9dsharma/beach-waste-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dev Sharma
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is designed for binary image classification tasks focused on detecting waste in beach environments. It consists of two categories: waste: Images containing visible trash, debris, or pollutants on beach surfaces. not_waste: Images showing clean beach areas with no visible waste.

The dataset can be used to train and evaluate deep learning models such as CNNs or transfer learning-based architectures (e.g., ResNet, MobileNet) for environmental monitoring, beach cleanliness automation, or AI-driven sustainability projects.

💡 Use Cases Environmental monitoring using computer vision Training waste detection models for drones or robots Public awareness and data-driven CSR initiatives 📁 Folder Structure Beach_Waste_Dataset/ ├── train/ │ ├── waste/ │ └── not_waste/ └── val/ ├── waste/ └── not_waste/ Each folder contains JPEG/PNG images resized and preprocessed to standard formats (e.g., 224x224). The dataset was prepared manually and can be extended with more images or annotations if needed.
MedMNIST: Standardized Biomedical Images
kaggle.com
Updated Feb 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
"'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

MedMNIST Landscape :

https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

Key Features

###

Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

Starter Code: download more data and training

Github Page: https://github.com/MedMNIST/MedMNIST

My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

Acknowledgements

Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

License and Citation

The code is under Apache-2.0 License.

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...
Mini NIH XRay Dataset for Binary Classification
kaggle.com
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abby Morgan (2023). Mini NIH XRay Dataset for Binary Classification [Dataset]. https://www.kaggle.com/datasets/abbymorgan/create-mini-xray-dataset-binary-classification-100
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abby Morgan
Description
The original full dataset contained 112,120 X-ray images with disease labels from 30,805 unique patients.

This notebook is modified from K Scott Mader's notebook here to create a mini chest x-ray dataset that is split 50:50 between normal and diseased images.

In my notebook I will use this dataset to test a pretrained model on a binary classification task (diseased vs. healthy xray), and then visualize which specific labels the model has the most trouble with.

Also, because disease classification is such an important task to get right, it's likely that any AI/ML medical classification task will include a human-in-the-loop. In this way, this process more closely resembles how this sort of ML would be used in the real world.

Note that the original notebook on which this one was based had two versions: Standard and Equalized. In this notebook we will be using the equalized version in order to save ourselves the extra step of performing CLAHE during the tensor transformations.

The goal of this notebook, as originally stated by Mader, is "to make a much easier to use mini-dataset out of the Chest X-Ray collection. The idea is to have something akin to MNIST or Fashion MNIST for medical images." In order to do this, we will preprocess, normalize, and scale down the images, and then save them into an HDF5 file with the corresponding tabular data.

Data limitations: The image labels are NLP extracted so there could be some erroneous labels but the NLP labeling accuracy is estimated to be >90%. Very limited numbers of disease region bounding boxes (See BBoxlist2017.csv) Chest x-ray radiology reports are not anticipated to be publicly shared. Parties who use this public dataset are encouraged to share their “updated” image labels and/or new bounding boxes in their own studied later, maybe through manual annotation

File Contents File is an HDF5 file of shape 200, 28. Main file contains nested HDF5 file of xray images with key images. Main HDF5 file keys are: - Image Index
- Finding Labels: list of disease labels
- Follow-up #
- Patient ID
- Patient Age
- Patient Gender: 'F'/'M'
- View Position: 'PA', 'AP' - OriginalImageWidth
- OriginalImageHeight
- OriginalImagePixelSpacing_x
- Normal: Binary; if Xray finding is 'Normal' - Atelectasis: Binary; if Xray finding includes 'Atelectasis' - Cardiomegaly: Binary; if Xray finding includes 'Cardiomegaly' - Consolidation: Binary; if Xray finding includes 'Consolidation' - Edema: Binary; if Xray finding includes 'Edema' - Effusion: Binary; if Xray finding includes 'Effusion' - Emphysema: Binary; if Xray finding includes 'Emphysema' - Fibrosis: Binary; if Xray finding includes 'Fibrosis' - Hernia: Binary; if Xray finding includes 'Hernia' - Infiltration: Binary; if Xray finding includes 'Infiltration' - Mass: Binary; if Xray finding includes 'Mass' - Nodule: Binary; if Xray finding includes 'Nodule' - Pleural_Thickening: Binary; if Xray finding includes 'Pleural_Thickening' - Pneumonia: Binary; if Xray finding includes'Pneumonia'
- Pneumothorax: Binary; if Xray finding includes 'Pneumothorax'
Star-Galaxy Classification Data
kaggle.com
Updated Jun 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Divyansh Agrawal (2021). Star-Galaxy Classification Data [Dataset]. http://doi.org/10.34740/kaggle/ds/1396185
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/1396185
Dataset updated
Jun 24, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Divyansh Agrawal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This data was created as a part of my project at Aryabhatta Research Institute of Observational Sciences (ARIES), Nainital, India. The images were captured by the in-house 1.3m telescope of the observatory situated in Devasthal, Nainital, India. The original images captured were 2kx2k in size which was reduced to 64x64 cutouts from the images to isolate the sources in a single image.

For labelling the images, image segmentation was used to identify the sources in the image, and finally the center coordinates of the found sources were queried with the SDSS database to give a label corresponding to each 64x64 cutout.

Finally, the cutouts were saved in different directories according to the label suggested by the SDSS query search.

This data is generated from scratch using the real-world data. Use this dataset to train computer vision models to classify stellar sources like stars and galaxies in the telescope images.
Aptos and Messidor eye images
kaggle.com
Updated Jun 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anik Bhowmick ae20b102 (2024). Aptos and Messidor eye images [Dataset]. https://www.kaggle.com/datasets/anikbhowmickae20b102/binary-classification-data-aptos-and-messidor
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2024
Dataset provided by
Kaggle
Authors
Anik Bhowmick ae20b102
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Early detection of Diabetic Retinopathy is a key challenge to prevent a patient from potential vision loss. The task of DR detection often requires special expertise from ophthalmologists. In remote places of the world such facilities may not be available, so In an attempt to automate the detection of DR, machine learning and deep learning techniques can be adopted. Some of the recent papers have proven such success on various publicly available dataset.

Another challenge of deep learning techniques is the availability of rightly processed standardized data. Cleaning and preprocessing the data often takes much longer time than the model training. As a part of my research work, I had to preprocess the images taken from APTOS and Messidor before training the model. I applied circle-crop and Graham Ben's preprocessing technique and scaled all the images to 512X512 format. Also, I applied the data augmentation technique and increased the number of samples from 3662 data of APTOS to 18310, and 400 messidor samples to 3600 samples. I divided the images into two classes class 0 (NO DR) and class 1 (DR). The large number of data is essential for transfer learning. This process is very cumbersome and time-consuming. So I thought to upload the newly generated dataset in Kaggle so that some people might find it useful for their work. I hope this will help many people. Feel free to use the data.
Animal Classification
kaggle.com
Updated Mar 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
meena manogaran (2021). Animal Classification [Dataset]. https://www.kaggle.com/meenaaa/animal-classification/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 22, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
meena manogaran
Description
Dataset

This dataset was created by meena manogaran

Contents
Broken Eggs
kaggle.com
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Pereny (2023). Broken Eggs [Dataset]. http://doi.org/10.34740/kaggle/dsv/5367524
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5367524
Dataset updated
Apr 11, 2023
Dataset provided by
Kaggle
Authors
Frank Pereny
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Classification data set for broken and unbroken eggs. Broken eggs have varying levels of damage. Images were taken with a raspberry pi and a Logitech webcam. This data set is useful for binary or multiple image classification.
Watermarked / Not watermarked images
kaggle.com
Updated Oct 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felice Pollano (2020). Watermarked / Not watermarked images [Dataset]. https://www.kaggle.com/felicepollano/watermarked-not-watermarked-images/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2020
Dataset provided by
Kaggle
Authors
Felice Pollano
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Content

Here a suite of images with or without an applied watermark. Data is divided in training and validation with an 80/20 proportion. Each folder contains a watermark/no-watermark folder containing images to classify according to the presence of the watermark. If you are interested on downloading more images, or doing something similar, please have a look at the project on github Please NOTE Some images appear to be corrupted, rare case, but this break ImageDataLoader. In order to solve this, before consuming the images use the following code:

from PIL import ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True

Acknowledgements

I have to thanks (Pexels)[https://www.pexels.com/] for providing high quality images free for use, and for allow me to overtake some api request limitations.

Inspiration

By downloading images for the internet for other classification purpose, it would be useful to remove the one with watermarks in, so the needing for such a classification.
Sushi Sandwich
kaggle.com
Updated May 25, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bharat Kunwar (2018). Sushi Sandwich [Dataset]. https://www.kaggle.com/brtknr/sushisandwich/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bharat Kunwar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Bharat Kunwar

Released under CC0: Public Domain

Contents
Good and damaged nails (225 instances)
kaggle.com
Updated May 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Wrona (2023). Good and damaged nails (225 instances) [Dataset]. https://www.kaggle.com/datasets/jakubwrona/good-and-damaged-nails-225-instances
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jakub Wrona
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Dataset primarily for image classification. Made of photos of good and damaged nails. Dataset contains 225 instances in total, including 108 instances of good nails and 117 instances of damaged nails. Damaged nails are divided into subclasses by damage type. When used like that for multiclass classification, it is a few-shot problem (there are only 9 examples of smallest subclass). If used for binary classification (all subclasses connected into one Not-OK class), dataset is fairly balanced.
SETI_TFREC640*640_NEW
kaggle.com
Updated Jul 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manoj Prabhakar (2021). SETI_TFREC640*640_NEW [Dataset]. https://www.kaggle.com/manojprabhaakr/seti-tfrec640640-new/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Manoj Prabhakar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Manoj Prabhakar

Released under CC0: Public Domain

Contents
Flood Classification Dataset
kaggle.com
Updated Apr 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhawal Srivastava (2025). Flood Classification Dataset [Dataset]. https://www.kaggle.com/datasets/dhawalsrivastava2583/flood-classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dhawal Srivastava
Description
🌊 About Dataset

The Flood Classification Dataset (Combined) is a curated dataset designed to support binary classification tasks—distinguishing between flooded and non-flooded scenes. This dataset brings together flood imagery from multiple sources to address the growing need for disaster-focused image classification resources.

We encourage users to cite the original datasets when using this combined dataset for their research or applications.

📘 Context

Floods are among the most devastating natural disasters, causing significant damage to life, property, and the environment. Quick and accurate detection of flood-affected areas using computer vision techniques can assist in timely emergency response, disaster management, and risk assessment.

While there are existing datasets with flood imagery, many focus only on flooded scenarios, lacking contrasting non-flood samples essential for binary classification tasks. This dataset fills that gap by providing a more complete image set suitable for training AI models to detect flood presence.

🔗 Sources

This dataset is a combination of two existing public datasets:

1- Flooding Image Dataset (FloodIMG)

2- Flooded Images Dataset by Armaan Oajay

The original datasets were merged to create a balanced dataset containing both flooded and non-flooded images.

🧠 Key Features

*Image Type: RGB disaster scene images *Classes: Flood, Non-Flood *Total Images: Combined set from two datasets *Annotations: Some images contain annotations as provided in the original datasets *Resolution Note: Flood images are generally high-resolution, while non-flood images are lower-resolution

💡 Tip: For consistent model training, we recommend resizing all images to a uniform size such as 224 × 224 pixels, which works well with CNN architectures like EfficientNet or ResNet.

📁 Folder Structure

/ ├── flood-images/ ├── non-flood-images/ └── annotations/

🌐 Applications

*This dataset is well-suited for researchers and developers working in: *Computer Vision & Machine Learning *Disaster Detection and Monitoring *Environmental Risk Assessment *Emergency Response Automation *CNN-based Classification Projects

🧭 Inspiration

The dataset was inspired by the need for a reliable, clean, and balanced dataset that can aid in the development of deep learning models capable of distinguishing between flooded and non-flooded scenes. Such systems are critical for real-time monitoring, automated reporting, and AI-powered disaster management.

🪪 License

This dataset is shared under the same license as the original datasets it is derived from. Please consult the individual dataset sources for specific licensing terms.

Facebook

Twitter

Click to copy link

Link copied

Cite

Marionette 👺 (2022). Shells or Pebbles: An Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/vencerlanz09/shells-or-pebbles-an-image-classification-dataset

Shells or Pebbles: An Image Classification Dataset

A dataset containing images of shells and pebbles for image classification tasks

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 28, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Marionette 👺

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Overview

The dataset contains two classes: Shells or Pebbles. This dataset can be used to for binary classification tasks to determine whether a certain image constitutes as a shell or a pebble. Cover Image by wirestock on Freepik

Inspiration

I found it cool to create an app with a CV algorithm that could classify whether a certain picture is a shell or image. The next time that I would be visiting a beach, I could just use the app to help me collect either shells or pebbles. 😄

Clear search

Close search

Google apps

Main menu

Shells or Pebbles: An Image Classification Dataset

Overview

Inspiration

X-Ray and Non X-Ray Image Classification Data

IS THAT SANTA? (Image Classification)

Context

Content

Inspiration

Arabic sign language binary image dataset

Dataset

Contents

Non and Biodegradable Material Dataset

Motivation

Content

Acknowledgements

Covid-19 Dataset

Context

Content

Dataset for cats and dogs image classification

Context

Content

Acknowledgements

Inspiration

Emotion(Happy) Detection Image DataSet

If You find this useful please Upvote it.☝️

Context

Beach Waste Detection Dataset

MedMNIST: Standardized Biomedical Images

Key Features

Starter Code: download more data and training

Acknowledgements

License and Citation

Mini NIH XRay Dataset for Binary Classification

Star-Galaxy Classification Data

Aptos and Messidor eye images

Animal Classification

Dataset

Contents

Broken Eggs

Watermarked / Not watermarked images

Content

Acknowledgements

Inspiration

Sushi Sandwich

Dataset

Contents

Good and damaged nails (225 instances)

SETI_TFREC640*640_NEW

Dataset

Contents

Flood Classification Dataset

Shells or Pebbles: An Image Classification Dataset

A dataset containing images of shells and pebbles for image classification tasks

Overview

Inspiration