15 datasets found
  1. Apple vs Orange Binary Classification

    • kaggle.com
    zip
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Kipshidze (2025). Apple vs Orange Binary Classification [Dataset]. https://www.kaggle.com/datasets/kipshidze/apple-vs-orange-binary-classification
    Explore at:
    zip(3415245 bytes)Available download formats
    Dataset updated
    Apr 20, 2025
    Authors
    Nick Kipshidze
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains 800 images (400 of apples and 400 of oranges) for image classification tasks. The images are organized into two folders: apple and orange, and it’s fully compatible with PyTorch’s torchvision.datasets.ImageFolder. If you ever need to have an image removed, just email me with the image details, and I’ll take care of it as soon as possible.

  2. Reptile & Amphibian Image large Dataset

    • kaggle.com
    zip
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cyber_knight_11 (2025). Reptile & Amphibian Image large Dataset [Dataset]. https://www.kaggle.com/datasets/cyberknight11/herpeton-reptile-and-amphibian-image-dataset/discussion
    Explore at:
    zip(67607339523 bytes)Available download formats
    Dataset updated
    Apr 25, 2025
    Authors
    cyber_knight_11
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Download and Extract:

    Download the dataset from Kaggle.

    Extract the ZIP file if needed; images are organized into folders, where each folder name is the class label (like snake, lizard, frog, etc.).

    Understand the Structure:

    The dataset contains 9 major classes of reptiles and amphibians.

    Each class folder contains multiple high-quality images belonging to that species or group.

    Load the Dataset into Your Project:

    If using PyTorch, use torchvision.datasets.ImageFolder to load images directly.

    If using TensorFlow, use tf.keras.utils.image_dataset_from_directory.

    You can also manually read images using OpenCV or PIL if needed.

    Preprocessing:

    Resize images if needed (e.g., 224x224 for ResNet models).

    Normalize pixel values (e.g., divide by 255) to prepare for training.

    Splitting the Data:

    Optionally split the dataset into train, validation, and test sets.

    You can split randomly or based on a percentage (e.g., 80% training, 20% validation/testing).

    Training Your Model:

    You can use any CNN model like ResNet, MobileNet, EfficientNet, etc.

    Fine-tune pre-trained models using transfer learning for faster results.

    Use the class folders for automatic label generation.

    Handling Easily:

    Use batch processing and data augmentation (flip, rotate, zoom) during training.

    Use GPU if available for faster training.

    Keep your classes in a list if needed for mapping predictions back to names.

  3. h

    cropsVSweed

    • huggingface.co
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samarth Agarwal (2024). cropsVSweed [Dataset]. https://huggingface.co/datasets/Sa-m/cropsVSweed
    Explore at:
    Dataset updated
    Aug 11, 2024
    Authors
    Samarth Agarwal
    Description

    WeedCrop Image Dataset Data Description It includes 2822 images. Images are annotated in YOLO v5 PyTorch format. -Train directory contains 2469 images and respective labels in yolov5 Pytorch format. -Validation directory contains 235 images and respective labels in yolov5 Pytorch format. -Test directory contains 118 images and respective labels in yolov5 Pytorch format. Reference- https://www.kaggle.com/datasets/vinayakshanawad/weedcrop-image-dataset

  4. Z

    Cellpose model for Digital Phase Contrast images

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Capolupo; Olivier Burri; Romain Guiet (2022). Cellpose model for Digital Phase Contrast images [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6023316
    Explore at:
    Dataset updated
    Feb 21, 2022
    Dataset provided by
    EPFL SV IBI-SV UPDANGELO
    EPFL SV PTECH PTBIOP
    Authors
    Laura Capolupo; Olivier Burri; Romain Guiet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Name: Cellpose model for Digital Phase Contrast images

    Data type: Cellpose model, trained via transfer learning from ‘cyto’ model.

    Training Dataset: Light microscopy (Digital Phase Contrast) and Manual annotations (10.5281/zenodo.5996883)

    Training Procedure: Model was trained using a Cellpose version 0.6.5 with GPU support (NVIDIA GeForce RTX 2080) using default settings as per the Cellpose documentation

    python -m cellpose --train --dir TRAINING/DATASET/PATH/train --test_dir TRAINING/DATASET/PATH/test --pretrained_model cyto --chan 0 --chan2 0

    The model file (MODEL NAME) in this repository is the result of this training.

    Prediction Procedure: Using this model, a label image can be obtained from new unseen images in a given folder with

    python -m cellpose --dir NEW/DATASET/PATH --pretrained_model FULL_MODEL_PATH --chan 0 --chan2 0 --save_tif --no_npy

  5. Scene Classification Dataset: 6 Categories

    • kaggle.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2025). Scene Classification Dataset: 6 Categories [Dataset]. https://www.kaggle.com/datasets/evilspirit05/intel-image-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset appears to be a scene classification dataset composed of images organized into folders by category.

    📁 Folder Categories: * mountain * street * (and likely four other folders — the log mentioned 6 total) * Each folder contains a series of images belonging to that scene class.

    📸 Image Characteristics: * Images are likely in .jpg or .png format. * The number of images per category (based on logs): * mountain: 525 images * street: 501 images * Others: Not shown, but presumably similar counts

    🧾 Format: Directory-based format suitable for use with tools like ImageFolder from PyTorch or flow_from_directory from Keras. /dataset/ /mountain/ image_1.jpg image_2.jpg ... /street/ image_1.jpg ... ... 🧠 Possible Uses: * Scene recognition * Transfer learning with CNNs * Training and testing classification models * Educational or benchmarking tasks

  6. Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier

    • zenodo.org
    bin, zip
    Updated Nov 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Dunkel; Emily Dunkel (2022). Lunar Reconnaissance Orbiter Imagery for LROCNet Moon Classifier [Dataset]. http://doi.org/10.5281/zenodo.7041842
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Nov 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Emily Dunkel; Emily Dunkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary

    We provide imagery used to train LROCNet -- our Convolutional Neural Network classifier of orbital imagery of the moon. Images are divided into train, validation, and test zip files, which contain class specific sub-folders. We have three classes: "fresh crater", "old crater", and "none". Classes are described in detail in the attached labeling guide.

    Directory Contents

    We include the labeling guide and training, testing, and validation data. Training data was split to avoid upload timeouts.

    • LROC_Labeling_Intro_for_release.ppt: Labeling guide
    • val: Validation images divided into class sub-folders
      • ejecta: "fresh crater" class
      • oldcrater: "old crater" class
      • none: "none" class
    • test: Testing images divided into class sub-folders
      • ejecta: "fresh crater" class
      • oldcrater: "old crater" class
      • none: "none" class
    • ejecta_train: Training images of "fresh crater" class
    • oldcrater_train: Training images of "old crater" class
    • none_train1-4: Training images of "none" class (divided into 4 just for uploading)

    Data Description

    We use CDR (Calibrated Data Record) browse imagery (50% resolution) from the Lunar Reconnaissance Orbiter's Narrow Angle Cameras (NACs). Data we get from the NACs are 5-km swaths, at nominal orbit, so we perform a saliency detection step to find surface features of interest. A detector developed for Mars HiRISE (Wagstaff et al.) worked well for our purposes, after updating based on LROC NAC image resolution. We use this detector to create a set of image chipouts (small 227x277 cutouts) from the larger image, sampling the lunar globe.

    Class Labeling

    We select classes of interest based on what is visible at the NAC resolution, consulting with scientists and performing a literature review. Initially, we have 7 classes: "fresh crater", "old crater", "overlapping craters", "irregular mare patches", "rockfalls and landfalls", "of scientific interest", and "none".

    Using the Zooniverse platform, we set up a labeling tool and labeled 5,000 images. We found that "fresh crater" make up 11% of the data, "old crater" 18%, with the vast majority "none". Due to limited examples of the other classes, we reduce our initial class set to: "fresh crater" (with impact ejecta), "old crater", and "none".

    We divide the images into train/validation/test sets making sure no image swaths span multiple sets.

    Data Augmentation

    Using PyTorch, we apply the following augmentation on the training set only: horizontal flip, vertical flip, rotation by 90/180/270 degrees, and brightness adjustment (0.5, 2). In addition, we use weighted sampling so that each class is weighted equally. The training set included here does not include augmentation since that was performed within PyTorch.

    Acknowledgements

    The author would like to thank the volunteers who provided annotations for this data set, as well as others who contributed to this work (as in the Contributor list). We would also like to thank the PDS Imaging Node for support of this work.

    The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).

    CL#22-4763

    © 2022 California Institute of Technology. Government sponsorship acknowledged.

  7. MIEDT dataset

    • kaggle.com
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    机关鸢鸟 (2025). MIEDT dataset [Dataset]. https://www.kaggle.com/datasets/lidang78/miedt-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    机关鸢鸟
    Description
      1. Dataset Overview This dataset is organized based on the edge detection task, aiming to provide rich image resources and corresponding edge detection annotation information for related research and applications, which can be used for the testing of edge detection algorithms. In order to evaluate the performance of the edge detection method comprehensively, we created the Medical Image Edge Detection Test (MIEDT) dataset. The MIEDT contains 100 medical images, which were randomly selected from three publicly available datasets, Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 .
      1. Data Set Structure Original image: This folder stores the original image data. It contains 15 Head CT images in PNG format with varying image resolutions; 25 coronary heart disease images in JPG format and with an image resolution of [1024 * 1024]; 60 skin images in JPG format and with an image resolution of [600 * 450]. It covers a variety of medical image materials with different imaging and contrast, providing diverse input data for edge detection algorithms. Ground truth:The data in this folder are the edge detection annotation images corresponding to the images in the "Originals" folder. They are in PNG format. In these images, the white pixels represent the edge parts of the image, and the black pixels represent the non-edge areas. These annotation information accurately outlines the object contours and edge features in the original images.
      1. Usage Instructions For users who conduct image processing using Python, they can utilize the cv2 (OpenCV) library to read image data. The sample code is as follows:

    import cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.

    • 4. Data Sources and References Data Sources: The original images are collected from public image datasets Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 to ensure the quality and diversity of the images. If you are using this dataset in academic research, please cite the following literature.

    References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368

    [2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).

    [3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15

  8. Aluminum alloy industrial materials defect

    • figshare.com
    zip
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying Han; Yugang Wang (2024). Aluminum alloy industrial materials defect [Dataset]. http://doi.org/10.6084/m9.figshare.27922929.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ying Han; Yugang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200,patience: 50,batch: 16,imgsz: 640,pretrained: true,optimizer: SGD,close_mosaic: 10,iou: 0.7,momentum: 0.937,weight_decay: 0.0005,box: 7.5,cls: 0.5,dfl: 1.5,pose: 12.0,kobj: 1.0,save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.

  9. Pest Dataset V2

    • kaggle.com
    zip
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahima Gabar Diop (2025). Pest Dataset V2 [Dataset]. https://www.kaggle.com/datasets/ibrahimagabardiop/pestaidatasetv2
    Explore at:
    zip(1986986360 bytes)Available download formats
    Dataset updated
    Jul 9, 2025
    Authors
    Ibrahima Gabar Diop
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🐛 Balanced Pest Dataset for Agricultural AI

    This dataset contains 16 pest classes. Images were sourced from multiple public datasets on Kaggle and harmonized under consistent class names, using a combination of downsampling and upsampling (image augmentation via imgaug).

    🔍 Classes included:

    • Aphids, Thrips, Whitefly, Red Spider Mite, Mites, Slug, Snail, Grasshopper
    • Cicadellidae, Weevil, Cutworm, Earwig, Field Cricket, Bugs, Flea Beetle, Beetle

    🧪 Dataset layout

    Each class is stored in a separate folder, suitable for use with: - image_dataset_from_directory (TensorFlow/Keras) - PyTorch ImageFolder - FastAI dataloaders

    📌 Useful for

    • Pest classification models (EfficientNet, MobileNet, etc.)
    • Fine-tuning agricultural detection systems
    • AI-powered mobile apps for pest diagnostics
  10. VNFood 30 + 3 + 100 = 103

    • kaggle.com
    zip
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lê Anh Duy (2025). VNFood 30 + 3 + 100 = 103 [Dataset]. https://www.kaggle.com/datasets/meowluvmatcha/vnfood-30-100
    Explore at:
    zip(5369833111 bytes)Available download formats
    Dataset updated
    Nov 23, 2025
    Authors
    Lê Anh Duy
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    This dataset is made for educational purpose of the author, complete a class project, and could be expanded as the project goes.

    This dataset is a comprehensive collection of Vietnamese food images, curated to address the challenges of Fine-Grained Visual Classification (FGVC) in the culinary domain. It serves as a unified resource by integrating and standardizing three major existing datasets on Kaggle.

    Whether you are building a food delivery recommendation system, a calorie tracking app, or simply exploring computer vision with Vietnamese cuisine, this dataset provides a robust starting point.

    Data Sources

    We gratefully acknowledge and credit the original authors of the source datasets used in this compilation: 1. 30VNFoods: The baseline dataset containing 30 popular dishes. Original Link 2. Vietnamese-foods-extended: An extension pack adding diversity to existing classes. Original Link 3. 100 Vietnamese Food: A broad dataset covering 100 distinct dishes (~200 images/class). Original Link 4. Expect to gather more...

    Methodology & Processing

    1. Label Unification (Standardization)

    A major challenge with combining datasets is inconsistent naming conventions (e.g., "Bánh Mì", "banh_mi", "Banh-Mi"). We built a mapping table to normalize all labels into a single standard format: * Format: Lowercase, no accents, kebab-case (hyphen separated). * Example: "Bánh bèo", "Banh Beo" $\rightarrow$ banh-beo.

    3. Organization

    The merged data is structured in a standard format compatible with popular Deep Learning libraries like PyTorch (torchvision.datasets.ImageFolder) and TensorFlow/Keras.

    Dataset Structure

    The dataset is organized into a directory tree where each folder represents a class label:

    root/
    ├── train/
    │  ├── banh-mi/
    │  │  ├── image_01.jpg
    │  │  └── ...
    │  ├── pho/
    │  └── ...
    ├── test/
    │  └── ...
    └── val/
      └── ...
    
  11. 🧬 MultiCancer Dataset

    • kaggle.com
    zip
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramachandra Udupa (2025). 🧬 MultiCancer Dataset [Dataset]. https://www.kaggle.com/datasets/ramachandraudupa/multicancer-dataset/code
    Explore at:
    zip(14289313573 bytes)Available download formats
    Dataset updated
    Jul 13, 2025
    Authors
    Ramachandra Udupa
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🧬 MultiCancerNet Dataset

    MultiCancerNet is a diverse and carefully curated image dataset designed for multi-class cancer classification and general pathology research. It consists of high-quality images gathered from various trusted sources, encompassing a wide range of cancer types, precancerous conditions, and healthy tissue samples across different organs and systems.

    📁 File descriptions

    The dataset follows a standard PyTorch-style directory split, with clearly separated train/ and val/ folders for each class.

    Example directory structure (showing class names):

    ├── all_benign
    ├── all_early
    ├── all_pre
    ├── all_pro
    ├── brain_glioma_tumor
    ├── brain_meningioma_tumor
    ├── brain_normal
    ├── brain_pituitary_tumor
    ├── breast_benign
    ├── breast_malignant
    ├── cervix_dyk
    ├── cervix_koc
    ├── cervix_mep
    ├── cervix_pab
    ├── cervix_sfi
    ├── colon_aca
    ├── colon_bnt
    ├── kidney_cyst
    ├── kidney_normal
    ├── kidney_stone
    ├── kidney_tumor
    ├── lung_colon_aca
    ├── lung_colon_n
    ├── lung_lung_aca
    ├── lung_lung_scc
    ├── lymph_cll
    ├── lymph_fl
    ├── lymph_mcl
    ├── oral_normal
    ├── oral_scc
    ├── pancreatic_normal
    ├── pancreatic_tumor
    ├── Skin_Acne
    ├── Skin_Actinic Keratosis
    ├── Skin_Basal Cell Carcinoma
    ├── Skin_Chickenpox
    ├── Skin_Dermato Fibroma
    ├── Skin_Dyshidrotic Eczema
    ├── Skin_Melanoma
    ├── Skin_Nail Fungus
    ├── Skin_Nevus
    ├── Skin_Normal Skin
    ├── Skin_Pigmented Benign Keratosis
    ├── Skin_Ringworm
    ├── Skin_Seborrheic Keratosis
    ├── Skin_Squamous Cell Carcinoma
    └── Skin_Vascular Lesion
    

    Total: 48 classes representing cancers, benign conditions, and healthy tissues.

    🧠 Highlights

    • 🔬 Broad Coverage: Includes cancers from brain, breast, cervix, colon, kidney, lung, lymph nodes, oral cavity, pancreas, and skin.
    • 🧼 Neatly Organized: Data is arranged for direct loading with torchvision.datasets.ImageFolder or similar tools.
    • ⚖️ Multi-label Ready: Contains both class-specific folders and meta categories like all_benign, all_early, etc.
    • 💡 No Data Leakage: Clear train/validation splits for fair benchmarking.
    • 🔎 Cleaned and Vetted: Images are selected from the finest available sources with consistent labeling.

    📚 Use Cases

    • Multi-class classification
    • Cancer type identification
    • Early detection modeling
    • Transfer learning benchmarks
    • Domain generalization and robustness studies

    📦 Source Datasets

    The MultiCancerNet dataset was curated from multiple high-quality public sources, including:

    📌 Citation / Credit

    Dataset assembled by Ramachandra Udupa for research and educational purposes. If you use this dataset, please consider citing or acknowledging its creator.

  12. Coral Reefs Images

    • kaggle.com
    zip
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asfar Hossain Sitab (2025). Coral Reefs Images [Dataset]. https://www.kaggle.com/datasets/asfarhossainsitab/coral-reefs-images
    Explore at:
    zip(745680456 bytes)Available download formats
    Dataset updated
    Oct 11, 2025
    Authors
    Asfar Hossain Sitab
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🪸 Coral Reef Health Image Dataset

    📘 Overview

    This dataset contains high-quality images of coral reefs, categorized into two classes: Healthy and Bleached. It is designed for machine learning and deep learning applications focused on coral reef health monitoring, marine conservation, and environmental analysis.

    Researchers and data scientists can use this dataset to train classification models that automatically detect coral bleaching — an important environmental indicator of ocean health and climate change impact.

    🗂 Dataset Structure

    The dataset is divided into three main subsets to support a standard deep learning workflow:

    Coral Reef Images/
    ├── train/
    │  ├── Bleached/
    │  └── Healthy/
    ├── valid/
    │  ├── Bleached/
    │  └── Healthy/
    └── test/
      ├── Bleached/
      └── Healthy/
    
    SplitTotal ImagesBleachedHealthyPurpose
    Train9,6624,9804,682Used for model training
    Validation463240223Used to fine-tune model hyperparameters
    Test257135122Used for final model evaluation
    Total10,3825,3555,027

    🖼️ Image Details

    • Total images: 10,382
    • Total size: ~755 MB
    • Average resolution: Varies (most are high-quality RGB images)
    • Format: .jpg / .png

    🌊 Classes

    1. Bleached — Corals showing whitening or discoloration caused by stress (e.g., rising sea temperature, pollution).
    2. Healthy — Vibrant, colorful corals with no visible signs of bleaching or stress.

    💡 Potential Use Cases

    • Binary image classification (Healthy vs. Bleached coral)
    • Environmental monitoring and conservation research
    • Transfer learning using CNN architectures (ResNet, EfficientNet, VGG, etc.)
    • Data augmentation and image preprocessing experiments

    ⚙️ Recommended Frameworks

    You can easily use this dataset with:

    • TensorFlow / KerasImageDataGenerator or tf.data.Dataset
    • PyTorchtorchvision.datasets.ImageFolder
    • FastAI, Hugging Face, or other vision libraries

    📜 Citation

    If you use this dataset, please cite appropriately as:

    Coral Reef Images Dataset — Binary Classification for Coral Health Detection (Bleached vs. Healthy), 2025.

  13. Coconut Leaf Disease Dataset

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JIGAR BARAIYA (2025). Coconut Leaf Disease Dataset [Dataset]. https://www.kaggle.com/datasets/jigarbaraiya/coconut-leaf-disease-dataset
    Explore at:
    zip(29014079 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    JIGAR BARAIYA
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains real coconut leaf images categorized into 4 classes: 1. Healthy 2. Leaf Spot 3. Yellowing 4. Pest Damage

    Images were collected manually and organized into folders. This dataset is useful for: - Image classification - Mobile/edge model development - Agricultural AI research - CNN, TensorFlow, PyTorch training

    Each class has its own folder. The dataset can be directly used with Keras, PyTorch, and TFLite training pipelines.

  14. rsna-mammography-768-vl-perlabel

    • kaggle.com
    zip
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yacine Bouaouni (2023). rsna-mammography-768-vl-perlabel [Dataset]. https://www.kaggle.com/datasets/jarvisai7/rsna-mammography-768-vl-perlabel
    Explore at:
    zip(8268291887 bytes)Available download formats
    Dataset updated
    Feb 12, 2023
    Authors
    Yacine Bouaouni
    Description

    This dataset contains images in png format for RSNA Screening Mammography Breast Cancer Detection competition. All the images have a 768x768 size and are organized in two folders (one for each label). This allows inferring the labels from the folders (which might be helpful when using keras image_dataset_from_directory for example). - The processing of the DICOM files was done in this notebook by (Radek Osmulski ): https://www.kaggle.com/code/radek1/how-to-process-dicom-images-to-pngs?scriptVersionId=113529850 - The contribution made in this dataset is to organize the files in a way that enables label inferring from the directory name.

    🎉If this dataset helps you in your work don't hesitate to upvote 🎉

  15. Young and Old Images Dataset

    • kaggle.com
    zip
    Updated Jul 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Yanamandra (2019). Young and Old Images Dataset [Dataset]. https://www.kaggle.com/abhishekyana/young2old-dataset
    Explore at:
    zip(42953902 bytes)Available download formats
    Dataset updated
    Jul 24, 2019
    Authors
    Abhishek Yanamandra
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    As, there is no dataset readily available paired images for YoungnOld, I decided to scrap the data from Google Images. This is the code I executed to obtain the images. After running the script with giving apt keywords for search, the images will be downloaded into respective folders. Some keywords, I used are "elderly photos", "teenage selfies solo", "youth selfie photo" etc for around 20-25 keywords. Each keyword grabbed around 80 pictures where majority are irrelevant to the task. Some are group photos, blurred, vexels and so on. So, I manually handpicked the images that suit this task and ended up with 300ish images for each class.

    Content

    After unzipping the Zip file, we have 2 folders A and B. Where A is the Young Images collection and B for elderly images. I tried hard to keep this dataset simple to use. So, that is it. Sorry, if any of your images are in this dataset as scraped these images and I don't completely own them. These images are scraped using this script where keywords are to be given before running it. Trained a CycleGAN for Young to Old image converter application, Please feel free to check it out.

    Acknowledgements

    I would like to thank Aitor Ruano for his beautiful CycleGAN code, that helped to code mine easily.

    Inspiration

    Hosted on Kaggle with Love, Please feel free to use this dataset and come up with interesting projects.

    Best, AbhishekYana

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nick Kipshidze (2025). Apple vs Orange Binary Classification [Dataset]. https://www.kaggle.com/datasets/kipshidze/apple-vs-orange-binary-classification
Organization logo

Apple vs Orange Binary Classification

Simple apple & orange image dataset from Google scraping.

Explore at:
zip(3415245 bytes)Available download formats
Dataset updated
Apr 20, 2025
Authors
Nick Kipshidze
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This dataset contains 800 images (400 of apples and 400 of oranges) for image classification tasks. The images are organized into two folders: apple and orange, and it’s fully compatible with PyTorch’s torchvision.datasets.ImageFolder. If you ever need to have an image removed, just email me with the image details, and I’ll take care of it as soon as possible.

Search
Clear search
Close search
Google apps
Main menu