15 datasets found

Apple vs Orange Binary Classification
kaggle.com
zip
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kipshidze (2025). Apple vs Orange Binary Classification [Dataset]. https://www.kaggle.com/datasets/kipshidze/apple-vs-orange-binary-classification
Explore at:
zip(3415245 bytes)Available download formats
Dataset updated
Apr 20, 2025
Authors
Nick Kipshidze
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains 800 images (400 of apples and 400 of oranges) for image classification tasks. The images are organized into two folders: apple and orange, and it’s fully compatible with PyTorch’s torchvision.datasets.ImageFolder. If you ever need to have an image removed, just email me with the image details, and I’ll take care of it as soon as possible.
Reptile & Amphibian Image large Dataset
kaggle.com
zip
Updated Apr 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cyber_knight_11 (2025). Reptile & Amphibian Image large Dataset [Dataset]. https://www.kaggle.com/datasets/cyberknight11/herpeton-reptile-and-amphibian-image-dataset/discussion
Explore at:
zip(67607339523 bytes)Available download formats
Dataset updated
Apr 25, 2025
Authors
cyber_knight_11
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Download and Extract:

Download the dataset from Kaggle.

Extract the ZIP file if needed; images are organized into folders, where each folder name is the class label (like snake, lizard, frog, etc.).

Understand the Structure:

The dataset contains 9 major classes of reptiles and amphibians.

Each class folder contains multiple high-quality images belonging to that species or group.

Load the Dataset into Your Project:

If using PyTorch, use torchvision.datasets.ImageFolder to load images directly.

If using TensorFlow, use tf.keras.utils.image_dataset_from_directory.

You can also manually read images using OpenCV or PIL if needed.

Preprocessing:

Resize images if needed (e.g., 224x224 for ResNet models).

Normalize pixel values (e.g., divide by 255) to prepare for training.

Splitting the Data:

Optionally split the dataset into train, validation, and test sets.

You can split randomly or based on a percentage (e.g., 80% training, 20% validation/testing).

Training Your Model:

You can use any CNN model like ResNet, MobileNet, EfficientNet, etc.

Fine-tune pre-trained models using transfer learning for faster results.

Use the class folders for automatic label generation.

Handling Easily:

Use batch processing and data augmentation (flip, rotate, zoom) during training.

Use GPU if available for faster training.

Keep your classes in a list if needed for mapping predictions back to names.
h
cropsVSweed
huggingface.co
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samarth Agarwal (2024). cropsVSweed [Dataset]. https://huggingface.co/datasets/Sa-m/cropsVSweed
Explore at:
Dataset updated
Aug 11, 2024
Authors
Samarth Agarwal
Description
WeedCrop Image Dataset Data Description It includes 2822 images. Images are annotated in YOLO v5 PyTorch format. -Train directory contains 2469 images and respective labels in yolov5 Pytorch format. -Validation directory contains 235 images and respective labels in yolov5 Pytorch format. -Test directory contains 118 images and respective labels in yolov5 Pytorch format. Reference- https://www.kaggle.com/datasets/vinayakshanawad/weedcrop-image-dataset
Z
Cellpose model for Digital Phase Contrast images
data.niaid.nih.gov
zenodo.org
Updated Feb 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Capolupo; Olivier Burri; Romain Guiet (2022). Cellpose model for Digital Phase Contrast images [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6023316
Explore at:
Dataset updated
Feb 21, 2022
Dataset provided by
EPFL SV IBI-SV UPDANGELO
EPFL SV PTECH PTBIOP
Authors
Laura Capolupo; Olivier Burri; Romain Guiet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Name: Cellpose model for Digital Phase Contrast images

Data type: Cellpose model, trained via transfer learning from ‘cyto’ model.

Training Dataset: Light microscopy (Digital Phase Contrast) and Manual annotations (10.5281/zenodo.5996883)

Training Procedure: Model was trained using a Cellpose version 0.6.5 with GPU support (NVIDIA GeForce RTX 2080) using default settings as per the Cellpose documentation

python -m cellpose --train --dir TRAINING/DATASET/PATH/train --test_dir TRAINING/DATASET/PATH/test --pretrained_model cyto --chan 0 --chan2 0

The model file (MODEL NAME) in this repository is the result of this training.

Prediction Procedure: Using this model, a label image can be obtained from new unseen images in a given folder with

python -m cellpose --dir NEW/DATASET/PATH --pretrained_model FULL_MODEL_PATH --chan 0 --chan2 0 --save_tif --no_npy
Scene Classification Dataset: 6 Categories
kaggle.com
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evil Spirit05 (2025). Scene Classification Dataset: 6 Categories [Dataset]. https://www.kaggle.com/datasets/evilspirit05/intel-image-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Evil Spirit05
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset appears to be a scene classification dataset composed of images organized into folders by category.

📁 Folder Categories: * mountain * street * (and likely four other folders — the log mentioned 6 total) * Each folder contains a series of images belonging to that scene class.

📸 Image Characteristics: * Images are likely in .jpg or .png format. * The number of images per category (based on logs): * mountain: 525 images * street: 501 images * Others: Not shown, but presumably similar counts

🧾 Format: Directory-based format suitable for use with tools like ImageFolder from PyTorch or flow_from_directory from Keras. /dataset/ /mountain/ image_1.jpg image_2.jpg ... /street/ image_1.jpg ... ... 🧠 Possible Uses: * Scene recognition * Transfer learning with CNNs * Training and testing classification models * Educational or benchmarking tasks
Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier
zenodo.org
bin, zip
Updated Nov 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Dunkel; Emily Dunkel (2022). Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier [Dataset]. http://doi.org/10.5281/zenodo.7041842
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7041842
Dataset updated
Nov 1, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Emily Dunkel; Emily Dunkel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary

We provide imagery used to train LROCNet -- our Convolutional Neural Network classifier of orbital imagery of the moon. Images are divided into train, validation, and test zip files, which contain class specific sub-folders. We have three classes: "fresh crater", "old crater", and "none". Classes are described in detail in the attached labeling guide.

Directory Contents

We include the labeling guide and training, testing, and validation data. Training data was split to avoid upload timeouts.

LROC_Labeling_Intro_for_release.ppt: Labeling guide

val: Validation images divided into class sub-folders

ejecta: "fresh crater" class

oldcrater: "old crater" class

none: "none" class

test: Testing images divided into class sub-folders

ejecta: "fresh crater" class

oldcrater: "old crater" class

none: "none" class

ejecta_train: Training images of "fresh crater" class

oldcrater_train: Training images of "old crater" class

none_train1-4: Training images of "none" class (divided into 4 just for uploading)

Data Description

We use CDR (Calibrated Data Record) browse imagery (50% resolution) from the Lunar Reconnaissance Orbiter's Narrow Angle Cameras (NACs). Data we get from the NACs are 5-km swaths, at nominal orbit, so we perform a saliency detection step to find surface features of interest. A detector developed for Mars HiRISE (Wagstaff et al.) worked well for our purposes, after updating based on LROC NAC image resolution. We use this detector to create a set of image chipouts (small 227x277 cutouts) from the larger image, sampling the lunar globe.

Class Labeling

We select classes of interest based on what is visible at the NAC resolution, consulting with scientists and performing a literature review. Initially, we have 7 classes: "fresh crater", "old crater", "overlapping craters", "irregular mare patches", "rockfalls and landfalls", "of scientific interest", and "none".

Using the Zooniverse platform, we set up a labeling tool and labeled 5,000 images. We found that "fresh crater" make up 11% of the data, "old crater" 18%, with the vast majority "none". Due to limited examples of the other classes, we reduce our initial class set to: "fresh crater" (with impact ejecta), "old crater", and "none".

We divide the images into train/validation/test sets making sure no image swaths span multiple sets.

Data Augmentation

Using PyTorch, we apply the following augmentation on the training set only: horizontal flip, vertical flip, rotation by 90/180/270 degrees, and brightness adjustment (0.5, 2). In addition, we use weighted sampling so that each class is weighted equally. The training set included here does not include augmentation since that was performed within PyTorch.

Acknowledgements

The author would like to thank the volunteers who provided annotations for this data set, as well as others who contributed to this work (as in the Contributor list). We would also like to thank the PDS Imaging Node for support of this work.

The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

CL#22-4763

© 2022 California Institute of Technology. Government sponsorship acknowledged.
MIEDT dataset
kaggle.com
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
机关鸢鸟 (2025). MIEDT dataset [Dataset]. https://www.kaggle.com/datasets/lidang78/miedt-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
机关鸢鸟
Description
Dataset Overview This dataset is organized based on the edge detection task, aiming to provide rich image resources and corresponding edge detection annotation information for related research and applications, which can be used for the testing of edge detection algorithms. In order to evaluate the performance of the edge detection method comprehensively, we created the Medical Image Edge Detection Test (MIEDT) dataset. The MIEDT contains 100 medical images, which were randomly selected from three publicly available datasets, Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 .

Data Set Structure Original image: This folder stores the original image data. It contains 15 Head CT images in PNG format with varying image resolutions; 25 coronary heart disease images in JPG format and with an image resolution of [1024 * 1024]; 60 skin images in JPG format and with an image resolution of [600 * 450]. It covers a variety of medical image materials with different imaging and contrast, providing diverse input data for edge detection algorithms. Ground truth：The data in this folder are the edge detection annotation images corresponding to the images in the "Originals" folder. They are in PNG format. In these images, the white pixels represent the edge parts of the image, and the black pixels represent the non-edge areas. These annotation information accurately outlines the object contours and edge features in the original images.

Usage Instructions For users who conduct image processing using Python, they can utilize the cv2 (OpenCV) library to read image data. The sample code is as follows:

import cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.

4. Data Sources and References Data Sources: The original images are collected from public image datasets Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 to ensure the quality and diversity of the images. If you are using this dataset in academic research, please cite the following literature.

References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368

[2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).

[3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15
Aluminum alloy industrial materials defect
figshare.com
zip
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ying Han; Yugang Wang (2024). Aluminum alloy industrial materials defect [Dataset]. http://doi.org/10.6084/m9.figshare.27922929.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27922929.v3
Dataset updated
Dec 3, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ying Han; Yugang Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200，patience: 50，batch: 16，imgsz: 640，pretrained: true，optimizer: SGD，close_mosaic: 10，iou: 0.7，momentum: 0.937，weight_decay: 0.0005，box: 7.5，cls: 0.5，dfl: 1.5，pose: 12.0，kobj: 1.0，save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.
Pest Dataset V2
kaggle.com
zip
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrahima Gabar Diop (2025). Pest Dataset V2 [Dataset]. https://www.kaggle.com/datasets/ibrahimagabardiop/pestaidatasetv2
Explore at:
zip(1986986360 bytes)Available download formats
Dataset updated
Jul 9, 2025
Authors
Ibrahima Gabar Diop
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🐛 Balanced Pest Dataset for Agricultural AI

This dataset contains 16 pest classes. Images were sourced from multiple public datasets on Kaggle and harmonized under consistent class names, using a combination of downsampling and upsampling (image augmentation via imgaug).

🔍 Classes included:

Aphids, Thrips, Whitefly, Red Spider Mite, Mites, Slug, Snail, Grasshopper

Cicadellidae, Weevil, Cutworm, Earwig, Field Cricket, Bugs, Flea Beetle, Beetle

🧪 Dataset layout

Each class is stored in a separate folder, suitable for use with: - image_dataset_from_directory (TensorFlow/Keras) - PyTorch ImageFolder - FastAI dataloaders

📌 Useful for

Pest classification models (EfficientNet, MobileNet, etc.)

Fine-tuning agricultural detection systems

AI-powered mobile apps for pest diagnostics
VNFood 30 + 3 + 100 = 103
kaggle.com
zip
Updated Nov 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lê Anh Duy (2025). VNFood 30 + 3 + 100 = 103 [Dataset]. https://www.kaggle.com/datasets/meowluvmatcha/vnfood-30-100
Explore at:
zip(5369833111 bytes)Available download formats
Dataset updated
Nov 23, 2025
Authors
Lê Anh Duy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

This dataset is made for educational purpose of the author, complete a class project, and could be expanded as the project goes.

This dataset is a comprehensive collection of Vietnamese food images, curated to address the challenges of Fine-Grained Visual Classification (FGVC) in the culinary domain. It serves as a unified resource by integrating and standardizing three major existing datasets on Kaggle.

Whether you are building a food delivery recommendation system, a calorie tracking app, or simply exploring computer vision with Vietnamese cuisine, this dataset provides a robust starting point.

Data Sources

We gratefully acknowledge and credit the original authors of the source datasets used in this compilation: 1. 30VNFoods: The baseline dataset containing 30 popular dishes. Original Link 2. Vietnamese-foods-extended: An extension pack adding diversity to existing classes. Original Link 3. 100 Vietnamese Food: A broad dataset covering 100 distinct dishes (~200 images/class). Original Link 4. Expect to gather more...

Methodology & Processing

1. Label Unification (Standardization)

A major challenge with combining datasets is inconsistent naming conventions (e.g., "Bánh Mì", "banh_mi", "Banh-Mi"). We built a mapping table to normalize all labels into a single standard format: * Format: Lowercase, no accents, kebab-case (hyphen separated). * Example: "Bánh bèo", "Banh Beo" $\rightarrow$ banh-beo.

3. Organization

The merged data is structured in a standard format compatible with popular Deep Learning libraries like PyTorch (torchvision.datasets.ImageFolder) and TensorFlow/Keras.

Dataset Structure

The dataset is organized into a directory tree where each folder represents a class label:

root/ ├── train/ │ ├── banh-mi/ │ │ ├── image_01.jpg │ │ └── ... │ ├── pho/ │ └── ... ├── test/ │ └── ... └── val/ └── ...
🧬 MultiCancer Dataset
kaggle.com
zip
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramachandra Udupa (2025). 🧬 MultiCancer Dataset [Dataset]. https://www.kaggle.com/datasets/ramachandraudupa/multicancer-dataset/code
Explore at:
zip(14289313573 bytes)Available download formats
Dataset updated
Jul 13, 2025
Authors
Ramachandra Udupa
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🧬 MultiCancerNet Dataset

MultiCancerNet is a diverse and carefully curated image dataset designed for multi-class cancer classification and general pathology research. It consists of high-quality images gathered from various trusted sources, encompassing a wide range of cancer types, precancerous conditions, and healthy tissue samples across different organs and systems.

📁 File descriptions

The dataset follows a standard PyTorch-style directory split, with clearly separated train/ and val/ folders for each class.

Example directory structure (showing class names):

├── all_benign ├── all_early ├── all_pre ├── all_pro ├── brain_glioma_tumor ├── brain_meningioma_tumor ├── brain_normal ├── brain_pituitary_tumor ├── breast_benign ├── breast_malignant ├── cervix_dyk ├── cervix_koc ├── cervix_mep ├── cervix_pab ├── cervix_sfi ├── colon_aca ├── colon_bnt ├── kidney_cyst ├── kidney_normal ├── kidney_stone ├── kidney_tumor ├── lung_colon_aca ├── lung_colon_n ├── lung_lung_aca ├── lung_lung_scc ├── lymph_cll ├── lymph_fl ├── lymph_mcl ├── oral_normal ├── oral_scc ├── pancreatic_normal ├── pancreatic_tumor ├── Skin_Acne ├── Skin_Actinic Keratosis ├── Skin_Basal Cell Carcinoma ├── Skin_Chickenpox ├── Skin_Dermato Fibroma ├── Skin_Dyshidrotic Eczema ├── Skin_Melanoma ├── Skin_Nail Fungus ├── Skin_Nevus ├── Skin_Normal Skin ├── Skin_Pigmented Benign Keratosis ├── Skin_Ringworm ├── Skin_Seborrheic Keratosis ├── Skin_Squamous Cell Carcinoma └── Skin_Vascular Lesion

Total: 48 classes representing cancers, benign conditions, and healthy tissues.

🧠 Highlights

🔬 Broad Coverage: Includes cancers from brain, breast, cervix, colon, kidney, lung, lymph nodes, oral cavity, pancreas, and skin.

🧼 Neatly Organized: Data is arranged for direct loading with torchvision.datasets.ImageFolder or similar tools.

⚖️ Multi-label Ready: Contains both class-specific folders and meta categories like all_benign, all_early, etc.

💡 No Data Leakage: Clear train/validation splits for fair benchmarking.

🔎 Cleaned and Vetted: Images are selected from the finest available sources with consistent labeling.

📚 Use Cases

Multi-class classification

Cancer type identification

Early detection modeling

Transfer learning benchmarks

Domain generalization and robustness studies

📦 Source Datasets

The MultiCancerNet dataset was curated from multiple high-quality public sources, including:

Cancer Types (Multiple Organs) – maestroalert

Kidney CT Dataset (Normal, Cyst, Tumor, Stone) – nazmul0087

Skin Disease Detection Dataset – mgmitesh

Cervical Cancer (SIPaKMeD) – prahladmehandiratta

Lung and Colon Cancer Dataset – scipygaurav

📌 Citation / Credit

Dataset assembled by Ramachandra Udupa for research and educational purposes. If you use this dataset, please consider citing or acknowledging its creator.

Coral Reefs Images

kaggle.com

zip

Updated Oct 11, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Asfar Hossain Sitab (2025). Coral Reefs Images [Dataset]. https://www.kaggle.com/datasets/asfarhossainsitab/coral-reefs-images

Explore at:

zip(745680456 bytes)Available download formats

Dataset updated

Oct 11, 2025

Authors

Asfar Hossain Sitab

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

🪸 Coral Reef Health Image Dataset

📘 Overview

This dataset contains high-quality images of coral reefs, categorized into two classes: Healthy and Bleached. It is designed for machine learning and deep learning applications focused on coral reef health monitoring, marine conservation, and environmental analysis.

Researchers and data scientists can use this dataset to train classification models that automatically detect coral bleaching — an important environmental indicator of ocean health and climate change impact.

🗂 Dataset Structure

The dataset is divided into three main subsets to support a standard deep learning workflow:

Coral Reef Images/
├── train/
│  ├── Bleached/
│  └── Healthy/
├── valid/
│  ├── Bleached/
│  └── Healthy/
└── test/
  ├── Bleached/
  └── Healthy/

Split	Total Images	Bleached	Healthy	Purpose
Train	9,662	4,980	4,682	Used for model training
Validation	463	240	223	Used to fine-tune model hyperparameters
Test	257	135	122	Used for final model evaluation
Total	10,382	5,355	5,027	—

🖼️ Image Details

Total images: 10,382
Total size: ~755 MB
Average resolution: Varies (most are high-quality RGB images)
Format: .jpg / .png

🌊 Classes

Bleached — Corals showing whitening or discoloration caused by stress (e.g., rising sea temperature, pollution).
Healthy — Vibrant, colorful corals with no visible signs of bleaching or stress.

💡 Potential Use Cases

Binary image classification (Healthy vs. Bleached coral)
Environmental monitoring and conservation research
Transfer learning using CNN architectures (ResNet, EfficientNet, VGG, etc.)
Data augmentation and image preprocessing experiments

⚙️ Recommended Frameworks

You can easily use this dataset with:

TensorFlow / Keras → ImageDataGenerator or tf.data.Dataset
PyTorch → torchvision.datasets.ImageFolder
FastAI, Hugging Face, or other vision libraries

📜 Citation

If you use this dataset, please cite appropriately as:

Coral Reef Images Dataset — Binary Classification for Coral Health Detection (Bleached vs. Healthy), 2025.

Coconut Leaf Disease Dataset
kaggle.com
zip
Updated Nov 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JIGAR BARAIYA (2025). Coconut Leaf Disease Dataset [Dataset]. https://www.kaggle.com/datasets/jigarbaraiya/coconut-leaf-disease-dataset
Explore at:
zip(29014079 bytes)Available download formats
Dataset updated
Nov 16, 2025
Authors
JIGAR BARAIYA
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains real coconut leaf images categorized into 4 classes: 1. Healthy 2. Leaf Spot 3. Yellowing 4. Pest Damage

Images were collected manually and organized into folders. This dataset is useful for: - Image classification - Mobile/edge model development - Agricultural AI research - CNN, TensorFlow, PyTorch training

Each class has its own folder. The dataset can be directly used with Keras, PyTorch, and TFLite training pipelines.
rsna-mammography-768-vl-perlabel
kaggle.com
zip
Updated Feb 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yacine Bouaouni (2023). rsna-mammography-768-vl-perlabel [Dataset]. https://www.kaggle.com/datasets/jarvisai7/rsna-mammography-768-vl-perlabel
Explore at:
zip(8268291887 bytes)Available download formats
Dataset updated
Feb 12, 2023
Authors
Yacine Bouaouni
Description
This dataset contains images in png format for RSNA Screening Mammography Breast Cancer Detection competition. All the images have a 768x768 size and are organized in two folders (one for each label). This allows inferring the labels from the folders (which might be helpful when using keras image_dataset_from_directory for example). - The processing of the DICOM files was done in this notebook by (Radek Osmulski ): https://www.kaggle.com/code/radek1/how-to-process-dicom-images-to-pngs?scriptVersionId=113529850 - The contribution made in this dataset is to organize the files in a way that enables label inferring from the directory name.

🎉If this dataset helps you in your work don't hesitate to upvote 🎉
Young and Old Images Dataset
kaggle.com
zip
Updated Jul 24, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Yanamandra (2019). Young and Old Images Dataset [Dataset]. https://www.kaggle.com/abhishekyana/young2old-dataset
Explore at:
zip(42953902 bytes)Available download formats
Dataset updated
Jul 24, 2019
Authors
Abhishek Yanamandra
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

As, there is no dataset readily available paired images for YoungnOld, I decided to scrap the data from Google Images. This is the code I executed to obtain the images. After running the script with giving apt keywords for search, the images will be downloaded into respective folders. Some keywords, I used are "elderly photos", "teenage selfies solo", "youth selfie photo" etc for around 20-25 keywords. Each keyword grabbed around 80 pictures where majority are irrelevant to the task. Some are group photos, blurred, vexels and so on. So, I manually handpicked the images that suit this task and ended up with 300ish images for each class.

Content

After unzipping the Zip file, we have 2 folders A and B. Where A is the Young Images collection and B for elderly images. I tried hard to keep this dataset simple to use. So, that is it. Sorry, if any of your images are in this dataset as scraped these images and I don't completely own them. These images are scraped using this script where keywords are to be given before running it. Trained a CycleGAN for Young to Old image converter application, Please feel free to check it out.

Acknowledgements

I would like to thank Aitor Ruano for his beautiful CycleGAN code, that helped to code mine easily.

Inspiration

Hosted on Kaggle with Love, Please feel free to use this dataset and come up with interesting projects.

Best, AbhishekYana
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nick Kipshidze (2025). Apple vs Orange Binary Classification [Dataset]. https://www.kaggle.com/datasets/kipshidze/apple-vs-orange-binary-classification

Apple vs Orange Binary Classification

Simple apple & orange image dataset from Google scraping.

Explore at:

zip(3415245 bytes)Available download formats

Dataset updated

Apr 20, 2025

Authors

Nick Kipshidze

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset contains 800 images (400 of apples and 400 of oranges) for image classification tasks. The images are organized into two folders: apple and orange, and it’s fully compatible with PyTorch’s torchvision.datasets.ImageFolder. If you ever need to have an image removed, just email me with the image details, and I’ll take care of it as soon as possible.

Clear search

Close search

Google apps

Main menu

Apple vs Orange Binary Classification

Reptile & Amphibian Image large Dataset

cropsVSweed

Cellpose model for Digital Phase Contrast images

Scene Classification Dataset: 6 Categories

This dataset appears to be a scene classification dataset composed of images organized into folders by category.

Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier

MIEDT dataset

Aluminum alloy industrial materials defect

Pest Dataset V2

🐛 Balanced Pest Dataset for Agricultural AI

🔍 Classes included:

🧪 Dataset layout

📌 Useful for

VNFood 30 + 3 + 100 = 103

Context

Data Sources

Methodology & Processing

1. Label Unification (Standardization)

3. Organization

Dataset Structure

🧬 MultiCancer Dataset

🧬 MultiCancerNet Dataset

📁 File descriptions

🧠 Highlights

📚 Use Cases

📦 Source Datasets

📌 Citation / Credit

Coral Reefs Images

🪸 Coral Reef Health Image Dataset

📘 Overview

🗂 Dataset Structure

🖼️ Image Details

🌊 Classes

💡 Potential Use Cases

⚙️ Recommended Frameworks

📜 Citation

Coconut Leaf Disease Dataset

rsna-mammography-768-vl-perlabel

Young and Old Images Dataset

Context

Content

Acknowledgements

Inspiration

Apple vs Orange Binary Classification

Simple apple & orange image dataset from Google scraping.