Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 800 images (400 of apples and 400 of oranges) for image classification tasks. The images are organized into two folders: apple and orange, and it’s fully compatible with PyTorch’s torchvision.datasets.ImageFolder. If you ever need to have an image removed, just email me with the image details, and I’ll take care of it as soon as possible.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Download and Extract:
Download the dataset from Kaggle.
Extract the ZIP file if needed; images are organized into folders, where each folder name is the class label (like snake, lizard, frog, etc.).
Understand the Structure:
The dataset contains 9 major classes of reptiles and amphibians.
Each class folder contains multiple high-quality images belonging to that species or group.
Load the Dataset into Your Project:
If using PyTorch, use torchvision.datasets.ImageFolder to load images directly.
If using TensorFlow, use tf.keras.utils.image_dataset_from_directory.
You can also manually read images using OpenCV or PIL if needed.
Preprocessing:
Resize images if needed (e.g., 224x224 for ResNet models).
Normalize pixel values (e.g., divide by 255) to prepare for training.
Splitting the Data:
Optionally split the dataset into train, validation, and test sets.
You can split randomly or based on a percentage (e.g., 80% training, 20% validation/testing).
Training Your Model:
You can use any CNN model like ResNet, MobileNet, EfficientNet, etc.
Fine-tune pre-trained models using transfer learning for faster results.
Use the class folders for automatic label generation.
Handling Easily:
Use batch processing and data augmentation (flip, rotate, zoom) during training.
Use GPU if available for faster training.
Keep your classes in a list if needed for mapping predictions back to names.
Facebook
TwitterWeedCrop Image Dataset Data Description It includes 2822 images. Images are annotated in YOLO v5 PyTorch format. -Train directory contains 2469 images and respective labels in yolov5 Pytorch format. -Validation directory contains 235 images and respective labels in yolov5 Pytorch format. -Test directory contains 118 images and respective labels in yolov5 Pytorch format. Reference- https://www.kaggle.com/datasets/vinayakshanawad/weedcrop-image-dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Name: Cellpose model for Digital Phase Contrast images
Data type: Cellpose model, trained via transfer learning from ‘cyto’ model.
Training Dataset: Light microscopy (Digital Phase Contrast) and Manual annotations (10.5281/zenodo.5996883)
Training Procedure: Model was trained using a Cellpose version 0.6.5 with GPU support (NVIDIA GeForce RTX 2080) using default settings as per the Cellpose documentation
python -m cellpose --train --dir TRAINING/DATASET/PATH/train --test_dir TRAINING/DATASET/PATH/test --pretrained_model cyto --chan 0 --chan2 0
The model file (MODEL NAME) in this repository is the result of this training.
Prediction Procedure: Using this model, a label image can be obtained from new unseen images in a given folder with
python -m cellpose --dir NEW/DATASET/PATH --pretrained_model FULL_MODEL_PATH --chan 0 --chan2 0 --save_tif --no_npy
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
📁 Folder Categories: * mountain * street * (and likely four other folders — the log mentioned 6 total) * Each folder contains a series of images belonging to that scene class.
📸 Image Characteristics: * Images are likely in .jpg or .png format. * The number of images per category (based on logs): * mountain: 525 images * street: 501 images * Others: Not shown, but presumably similar counts
🧾 Format:
Directory-based format suitable for use with tools like ImageFolder from PyTorch or flow_from_directory from Keras.
/dataset/
/mountain/
image_1.jpg
image_2.jpg
...
/street/
image_1.jpg
...
...
🧠 Possible Uses:
* Scene recognition
* Transfer learning with CNNs
* Training and testing classification models
* Educational or benchmarking tasks
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
We provide imagery used to train LROCNet -- our Convolutional Neural Network classifier of orbital imagery of the moon. Images are divided into train, validation, and test zip files, which contain class specific sub-folders. We have three classes: "fresh crater", "old crater", and "none". Classes are described in detail in the attached labeling guide.
Directory Contents
We include the labeling guide and training, testing, and validation data. Training data was split to avoid upload timeouts.
Data Description
We use CDR (Calibrated Data Record) browse imagery (50% resolution) from the Lunar Reconnaissance Orbiter's Narrow Angle Cameras (NACs). Data we get from the NACs are 5-km swaths, at nominal orbit, so we perform a saliency detection step to find surface features of interest. A detector developed for Mars HiRISE (Wagstaff et al.) worked well for our purposes, after updating based on LROC NAC image resolution. We use this detector to create a set of image chipouts (small 227x277 cutouts) from the larger image, sampling the lunar globe.
Class Labeling
We select classes of interest based on what is visible at the NAC resolution, consulting with scientists and performing a literature review. Initially, we have 7 classes: "fresh crater", "old crater", "overlapping craters", "irregular mare patches", "rockfalls and landfalls", "of scientific interest", and "none".
Using the Zooniverse platform, we set up a labeling tool and labeled 5,000 images. We found that "fresh crater" make up 11% of the data, "old crater" 18%, with the vast majority "none". Due to limited examples of the other classes, we reduce our initial class set to: "fresh crater" (with impact ejecta), "old crater", and "none".
We divide the images into train/validation/test sets making sure no image swaths span multiple sets.
Data Augmentation
Using PyTorch, we apply the following augmentation on the training set only: horizontal flip, vertical flip, rotation by 90/180/270 degrees, and brightness adjustment (0.5, 2). In addition, we use weighted sampling so that each class is weighted equally. The training set included here does not include augmentation since that was performed within PyTorch.
Acknowledgements
The author would like to thank the volunteers who provided annotations for this data set, as well as others who contributed to this work (as in the Contributor list). We would also like to thank the PDS Imaging Node for support of this work.
The research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).
CL#22-4763
© 2022 California Institute of Technology. Government sponsorship acknowledged.
Facebook
Twitterimport cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.
References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368
[2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).
[3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200,patience: 50,batch: 16,imgsz: 640,pretrained: true,optimizer: SGD,close_mosaic: 10,iou: 0.7,momentum: 0.937,weight_decay: 0.0005,box: 7.5,cls: 0.5,dfl: 1.5,pose: 12.0,kobj: 1.0,save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 16 pest classes. Images were sourced from multiple public datasets on Kaggle and harmonized under consistent class names, using a combination of downsampling and upsampling (image augmentation via imgaug).
Each class is stored in a separate folder, suitable for use with:
- image_dataset_from_directory (TensorFlow/Keras)
- PyTorch ImageFolder
- FastAI dataloaders
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is made for educational purpose of the author, complete a class project, and could be expanded as the project goes.
This dataset is a comprehensive collection of Vietnamese food images, curated to address the challenges of Fine-Grained Visual Classification (FGVC) in the culinary domain. It serves as a unified resource by integrating and standardizing three major existing datasets on Kaggle.
Whether you are building a food delivery recommendation system, a calorie tracking app, or simply exploring computer vision with Vietnamese cuisine, this dataset provides a robust starting point.
We gratefully acknowledge and credit the original authors of the source datasets used in this compilation: 1. 30VNFoods: The baseline dataset containing 30 popular dishes. Original Link 2. Vietnamese-foods-extended: An extension pack adding diversity to existing classes. Original Link 3. 100 Vietnamese Food: A broad dataset covering 100 distinct dishes (~200 images/class). Original Link 4. Expect to gather more...
A major challenge with combining datasets is inconsistent naming conventions (e.g., "Bánh Mì", "banh_mi", "Banh-Mi"). We built a mapping table to normalize all labels into a single standard format:
* Format: Lowercase, no accents, kebab-case (hyphen separated).
* Example: "Bánh bèo", "Banh Beo" $\rightarrow$ banh-beo.
The merged data is structured in a standard format compatible with popular Deep Learning libraries like PyTorch (torchvision.datasets.ImageFolder) and TensorFlow/Keras.
The dataset is organized into a directory tree where each folder represents a class label:
root/
├── train/
│ ├── banh-mi/
│ │ ├── image_01.jpg
│ │ └── ...
│ ├── pho/
│ └── ...
├── test/
│ └── ...
└── val/
└── ...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MultiCancerNet is a diverse and carefully curated image dataset designed for multi-class cancer classification and general pathology research. It consists of high-quality images gathered from various trusted sources, encompassing a wide range of cancer types, precancerous conditions, and healthy tissue samples across different organs and systems.
The dataset follows a standard PyTorch-style directory split, with clearly separated train/ and val/ folders for each class.
Example directory structure (showing class names):
├── all_benign
├── all_early
├── all_pre
├── all_pro
├── brain_glioma_tumor
├── brain_meningioma_tumor
├── brain_normal
├── brain_pituitary_tumor
├── breast_benign
├── breast_malignant
├── cervix_dyk
├── cervix_koc
├── cervix_mep
├── cervix_pab
├── cervix_sfi
├── colon_aca
├── colon_bnt
├── kidney_cyst
├── kidney_normal
├── kidney_stone
├── kidney_tumor
├── lung_colon_aca
├── lung_colon_n
├── lung_lung_aca
├── lung_lung_scc
├── lymph_cll
├── lymph_fl
├── lymph_mcl
├── oral_normal
├── oral_scc
├── pancreatic_normal
├── pancreatic_tumor
├── Skin_Acne
├── Skin_Actinic Keratosis
├── Skin_Basal Cell Carcinoma
├── Skin_Chickenpox
├── Skin_Dermato Fibroma
├── Skin_Dyshidrotic Eczema
├── Skin_Melanoma
├── Skin_Nail Fungus
├── Skin_Nevus
├── Skin_Normal Skin
├── Skin_Pigmented Benign Keratosis
├── Skin_Ringworm
├── Skin_Seborrheic Keratosis
├── Skin_Squamous Cell Carcinoma
└── Skin_Vascular Lesion
Total: 48 classes representing cancers, benign conditions, and healthy tissues.
torchvision.datasets.ImageFolder or similar tools.all_benign, all_early, etc.The MultiCancerNet dataset was curated from multiple high-quality public sources, including:
Dataset assembled by Ramachandra Udupa for research and educational purposes. If you use this dataset, please consider citing or acknowledging its creator.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains high-quality images of coral reefs, categorized into two classes: Healthy and Bleached. It is designed for machine learning and deep learning applications focused on coral reef health monitoring, marine conservation, and environmental analysis.
Researchers and data scientists can use this dataset to train classification models that automatically detect coral bleaching — an important environmental indicator of ocean health and climate change impact.
The dataset is divided into three main subsets to support a standard deep learning workflow:
Coral Reef Images/
├── train/
│ ├── Bleached/
│ └── Healthy/
├── valid/
│ ├── Bleached/
│ └── Healthy/
└── test/
├── Bleached/
└── Healthy/
| Split | Total Images | Bleached | Healthy | Purpose |
|---|---|---|---|---|
| Train | 9,662 | 4,980 | 4,682 | Used for model training |
| Validation | 463 | 240 | 223 | Used to fine-tune model hyperparameters |
| Test | 257 | 135 | 122 | Used for final model evaluation |
| Total | 10,382 | 5,355 | 5,027 | — |
.jpg / .pngYou can easily use this dataset with:
ImageDataGenerator or tf.data.Datasettorchvision.datasets.ImageFolderIf you use this dataset, please cite appropriately as:
Coral Reef Images Dataset — Binary Classification for Coral Health Detection (Bleached vs. Healthy), 2025.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains real coconut leaf images categorized into 4 classes: 1. Healthy 2. Leaf Spot 3. Yellowing 4. Pest Damage
Images were collected manually and organized into folders. This dataset is useful for: - Image classification - Mobile/edge model development - Agricultural AI research - CNN, TensorFlow, PyTorch training
Each class has its own folder. The dataset can be directly used with Keras, PyTorch, and TFLite training pipelines.
Facebook
TwitterThis dataset contains images in png format for RSNA Screening Mammography Breast Cancer Detection competition. All the images have a 768x768 size and are organized in two folders (one for each label). This allows inferring the labels from the folders (which might be helpful when using keras image_dataset_from_directory for example). - The processing of the DICOM files was done in this notebook by (Radek Osmulski ): https://www.kaggle.com/code/radek1/how-to-process-dicom-images-to-pngs?scriptVersionId=113529850 - The contribution made in this dataset is to organize the files in a way that enables label inferring from the directory name.
🎉If this dataset helps you in your work don't hesitate to upvote 🎉
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
As, there is no dataset readily available paired images for YoungnOld, I decided to scrap the data from Google Images. This is the code I executed to obtain the images. After running the script with giving apt keywords for search, the images will be downloaded into respective folders. Some keywords, I used are "elderly photos", "teenage selfies solo", "youth selfie photo" etc for around 20-25 keywords. Each keyword grabbed around 80 pictures where majority are irrelevant to the task. Some are group photos, blurred, vexels and so on. So, I manually handpicked the images that suit this task and ended up with 300ish images for each class.
After unzipping the Zip file, we have 2 folders A and B. Where A is the Young Images collection and B for elderly images. I tried hard to keep this dataset simple to use. So, that is it. Sorry, if any of your images are in this dataset as scraped these images and I don't completely own them. These images are scraped using this script where keywords are to be given before running it. Trained a CycleGAN for Young to Old image converter application, Please feel free to check it out.
I would like to thank Aitor Ruano for his beautiful CycleGAN code, that helped to code mine easily.
Hosted on Kaggle with Love, Please feel free to use this dataset and come up with interesting projects.
Best, AbhishekYana
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 800 images (400 of apples and 400 of oranges) for image classification tasks. The images are organized into two folders: apple and orange, and it’s fully compatible with PyTorch’s torchvision.datasets.ImageFolder. If you ever need to have an image removed, just email me with the image details, and I’ll take care of it as soon as possible.