Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset features 75 different classes of Butterflies. The dataset contains about 1000+ labelled images including the validation images. Each image belongs to only one butterfly category.
The label of each image are saved in Training_set.csv.
The Testing_set.csv contains names of image in test folder, which you need to predict the label and submit to Data Sprint 107 - Butterfly Image Classification.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a very high quality dataset of playing card images. All images are 224 X 224 X 3 in jpg format. All images in the dataset have been cropped so that only the image of a single card is present and the card occupies well over 50% of the pixels in the image. There are 7624 training images, 265 test images and 265 validation images. The train, test and validation directories are partitioned into 53 sub directories , one for each of the 53 types of cards. The dataset also includes a csv file which can be used to load the datasets.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains images and meta data for crop disease classification. For training purposes, it should be split into three sets necessary for Machine Learning and Deep Learning tasks, namely train, validation, and test splits.
The images are located in the "images" folder and labels can be obtained from the meta data in the csv file. Also, dataset class names are given in the class_names json file.
Good luck!
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Cat and Dog Classification dataset is a standard computer vision dataset that involves classifying photos as either containing a dog or a cat. This dataset is provided as a subset of photos from a much larger dataset of approximately 25 thousands.
The dataset contains 24,998 images, split into 12,499 Cat images and 12,499 Dog images. The training images are divided equally between cat and dog images, while the test images are not labeled. This allows users to evaluate their models on unseen data.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7367057%2F498b0fc0a7a8cf40ac4337da82a4ebc5%2Fhow-to-introduce-a-dog-to-a-cat-blog-cover.webp?generation=1696702214010539&alt=media" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Over a 1000 images of cats and dogs scraped off of google images. The problem statement is to build a model that can classify between a cat and a dog in an image as accurately as possible.
Image sizes range from roughly 100x100 pixels to 2000x1000 pixels.
Image format is jpeg.
Duplicates have been removed.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a comprehensive collection of scientific images curated for the advancement of image classification algorithms in the scientific domain. It comprises a diverse set of images across six distinct classes, providing a unique challenge for machine learning enthusiasts and researchers. The base source of the data is derived from the Biofors dataset, with additional images incorporated to enhance variety and complexity. All images are either in .JPG or .PNG formats.
The dataset is organized into six primary classes, each representing a different aspect of scientific imaging:
Blot-Gel: Images of various blotting techniques and gel electrophoresis results used in molecular biology.
FACS (Fluorescence-Activated Cell Sorting): Flow cytometry images showcasing cell populations based on fluorescent labeling.
Histopathology: High-resolution images of tissue sections stained to reveal cellular structures and patterns indicative of pathological states.
Macroscopy: Images captured without magnification, highlighting the gross features and details of biological specimens.
Microscopy: A collection of microscopic images that reveal the intricate details of cells and microorganisms.
Non-scientific: A control group of images unrelated to scientific inquiry, included to test the robustness of classification models. It mainly consists images from ImageNet dataset.
This dataset is ideal for developing and benchmarking image classification models that can be applied to:
Image Falsification and Fabrication Detection: This dataset serves as a foundation for developing forensic tools to combat image falsification and fabrication in scientific publications. With the Biofors dataset as a base, participants have the opportunity to create models that can detect unethical manipulations, thereby safeguarding the credibility of scientific research. The challenge lies in identifying subtle alterations that may indicate misconduct, such as duplicated, spliced, or artificially enhanced images. Success in this area has far-reaching implications, potentially preventing the spread of misinformation and preserving the integrity of scientific literature.
Automated Analysis of Scientific Experiments: The dataset facilitates the development of models for automated analysis in scientific experiments, which can significantly accelerate the pace of discovery. Automated research workflows, integrating computation, laboratory automation, and AI tools, are transforming how experiments are designed, conducted, and analyzed.
Diagnostic Tools in Medicine: In the medical field, diagnostic tools are essential for achieving diagnostic excellence, which involves making correct and timely diagnoses while maximizing patient experience and managing uncertainty. AI in healthcare is revolutionizing diagnostics, from analyzing medical images to identifying disease patterns and predicting patient outcomes.
[1] https://ieeexplore.ieee.org/document/9710731
[2] https://github.com/vimal-isi-edu/BioFors
[3] https://link.springer.com/chapter/10.1007/978-3-031-53085-2_26
[5] https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke (Histopathology images)
[6] https://www.kaggle.com/datasets/chopinforest/esophageal-endoscopy-images (Macroscopy)
[7] https://www.kaggle.com/datasets/safurahajiheidari/kidney-stone-images (Macroscopy)
[8] https://www.kaggle.com/datasets/alifrahman/covid19-chest-xray-image-dataset (Macroscopy)
[9] https://www.kaggle.com/datasets/vitaliykinakh/stable-imagenet1k (Non-scientific images)
[10] https://www.kaggle.com/datasets/nodoubttome/skin-cancer9-classesisic (Macroscopy)
[11] https://www.kaggle.com/datasets/sunedition/graphs-dataset (Non–scientific images)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Overview: This dataset is designed for vehicle classification tasks and contains a total of 5,600 images distributed across seven categories. Each category represents a different type of vehicle.
Structure:
Image Format: All images are in JPEG format with the .jpg extension.
Size: 5,600 images in total.
Usage: Ideal for building and testing image classification models to distinguish between different types of vehicles.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
🎯Citation in bibtex>>>
@inproceedings{ahmed2021dcnn, title={DCNN-based vegetable image classification using transfer learning: A comparative study}, author={Ahmed, M Israk and Mamun, Shahriyar Mahmud and Asif, Asif Uz Zaman}, booktitle={2021 5th International Conference on Computer, Communication and Signal Processing (ICCCSP)}, pages={235--243}, year={2021}, organization={IEEE} }
The initial experiment is done with 15 types of common vegetables that are found throughout the world. The vegetables that are chosen for the experimentation are- bean, bitter gourd, bottle gourd, brinjal, broccoli, cabbage, capsicum, carrot, cauliflower, cucumber, papaya, potato, pumpkin, radish and tomato. A total of 21000 images from 15 classes are used where each class contains 1400 images of size 224×224 and in *.jpg format. The dataset split 70% for training, 15% for validation, and 15% for testing purpose.
This dataset contains three folders:
The images in this dataset were collected by us from vegetable farm and market for a project.
We would like to give thanks to the people who helped us regarding data collection.
From vegetable production to delivery, several common steps are operated manually. Like picking, and sorting vegetables. Therefore, we decided to solve this problem using deep neural architecture, by developing a model that can detect and classify vegetables. That model can be implemented in different types of devices and can also solve other problems related to the identification of vegetables, like labeling the vegetables automatically without any need for human work.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset, contains a curated collection of images featuring four distinct big cat species: lions, tigers, leopards, and cheetahs. The images were sourced using the DuckDuckGo search engine and are organized into separate directories for each animal. This dataset is ideal for machine learning and computer vision projects focused on image classification and species recognition. With this dataset, you can train and validate your models to accurately differentiate between these majestic big cats.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Introducing the Fashion Apparel Image Classification Dataset for Convolutional Neural Networks (CNN), a carefully curated collection of clothing images specifically designed for CNN-based image classification tasks. This dataset features 5,413 high-quality images of various clothing items in two primary colors: black and blue. The images are categorized into 10 distinct classes:
Each category contains a substantial number of images, ranging from 299 to 871, ensuring a well-balanced and diverse dataset for robust model training and testing. The dataset showcases a wide variety of clothing styles, designs, and textures, making it an ideal resource for developing and refining CNN models for fashion apparel image classification.
This Fashion Apparel Image Classification Dataset for CNN is perfect for researchers, developers, and students working on computer vision, image processing, and deep learning projects in the fashion and apparel domain. Use it to train and test your CNN models for object detection, image segmentation, and clothing classification tasks. Explore this dataset and elevate your fashion apparel image classification projects to new heights.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
An image classification dataset of waste items across 9 major material types, collected within an authentic landfill environment.
Dataset Information
For what purpose was the dataset created? RealWaste was created as apart of an honors thesis researching how convolution neural networks could perform on authentic waste material when trained on objects in pure and unadulterated forms, when compared to training via real waste items.
What do the instances in this dataset represent? Color images of waste items captured at the point of reception in a landfill environment. Images are released in 524x524 resolution in line with accompanying research paper. For full size resolution images, please contact the corresponding author.
Additional Information
The labels applied to the images represent the material type present, however further refinement of labelling may be performed given the moderate dataset size (i.e., splitting the plastic class in transparent and opaque components). Under the proposed labels, image counts are as follows: - Cardboard: 461 - Food Organics: 411 - Glass: 420 - Metal: 790 - Miscellaneous Trash: 495 - Paper: 500 - Plastic: 921 - Textile Trash: 318 - Vegetation: 436
Has Missing Values?
No
Introductory Paper
RealWaste: A Novel Real-Life Data Set for Landfill Waste Classification Using Deep Learning By Sam Single, Saeid Iranmanesh, Raad Raad. 2023 Published in Information
Class Labels
Cardboard, Food Organics, Glass, Metal, Miscellaneous Trash, Paper, Plastic, Textile Trash, and Vegetation
Facebook
TwitterThis is image data of Natural Scenes around the world.
This Data contains around 25k images of size 150x150 distributed under 6 categories. {'buildings' -> 0, 'forest' -> 1, 'glacier' -> 2, 'mountain' -> 3, 'sea' -> 4, 'street' -> 5 }
The Train, Test and Prediction data is separated in each zip files. There are around 14k images in Train, 3k in Test and 7k in Prediction. This data was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge.
Thanks to https://datahack.analyticsvidhya.com for the challenge and Intel for the Data
Photo by Jan Böttinger on Unsplash
Want to build powerful Neural network that can classify these images with more accuracy.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The main folder food11 contains two sub-folders : 1. train 2. test
Both train and test folders have 11 folders named apple_pie cheesecake chicken_curry french_fries fried_rice hamburger hot_dog ice_cream omelette pizza sushi
Each folders inside train-set(total=9900) contain 900 images, and folders inside test-set(total=1100) contain 100 images.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset contains 30 different types of crop images in separate folders
Task To classify all types of agriculture crop images ( rice, sugarcane, maize ,lemon, banana,coconut , jute etc..) with better accuracy.
Inspiration The question to be answered to classify crops in each type.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
dataset of 10 types of tree nuts. 1163train, 50 test,50 validation files 224 X 224 X 3 jpg format. Also includes a tensorflow trained model nuts)100.0.hs that achieved an F1 score of 100%. A csv file tree nuts.csv is also provided
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains concrete images having cracks. The data is collected from various METU Campus Buildings. The dataset is divided into two as negative and positive crack images for image classification. Each class has 20000images with a total of 40000 images with 227 x 227 pixels with RGB channels. The dataset is generated from 458 high-resolution images (4032x3024 pixel) with the method proposed by Zhang et al (2016). High-resolution images have variance in terms of surface finish and illumination conditions. No data augmentation in terms of random rotation or flipping is applied.
Özgenel, Çağlar Fırat (2019), “Concrete Crack Images for Classification”, Mendeley Data, V2, doi: 10.17632/5y9wdsg2zt.2
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Gerry
Released under CC0: Public Domain
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
I wanted to collect real fresh outdoors images with fire classes in a part of Misk Foundation Data Science Immersive project. With MS Bing API I collected and cleaned up to 1500 images for all classes. Further, I collected data from four kaggle datasets, their credits are below.
Check my GitHub repo for my work https://github.com/ammar-faifi/Weather_Status_Predictor_From_Images
Check the report here https://ammar-faifi.github.io/Weather_Status_Predictor_From_Images/
Online predictor here https://dsi-weather-predictor.herokuapp.com
| Class | Folder | Images Count |
|---|---|---|
| Sunny | sunny | 6702 |
| Cloudy | cloudy | 6274 |
| Foggy | foggy | 1261 |
| Rainy | rainy | 1927 |
| Snowy | snowy | 1875 |
| Total | Nan | 18039 |
1 - Manually from Bing API 2 - https://www.kaggle.com/datasets/jagadeesh23/weather-classification 3 - https://www.kaggle.com/datasets/polavr/twoclass-weather-classification 4 - https://www.kaggle.com/datasets/jehanbhathena/weather-dataset 5 - https://www.kaggle.com/datasets/pratik2901/multiclass-weather-dataset
Facebook
TwitterThis dataset was created by David King_Rutgers
Facebook
TwitterThe "Logo-2K+" dataset, published in the paper "Logo-2K+: Discriminative Region Navigation and Augmentation Network for Scalable Logo Classification", is a collection of 167,140 images of logos belonging to 2,341 sub-classes across 10 root-categories. The images were crawled from the Google and Baidu search engines.
Before making the dataset available for public use, I have carefully cleaned the dataset to ensure that it can be loaded and used without any errors. I have removed all folders with special characters and spaces in their names, and only kept alphanumeric characters and underscores. This makes the dataset more accessible and easier to use for researchers and developers working on logo classification.
The cleaned dataset is provided in three parts:
"Logo-2K+.rar" contains the original 167,140 logo images, grouped into 10 root-categories and 2,341 sub-classes.
a. "Logo-2K+classes.txt" provides labels for all sub-classes.
b. "train_images_root.txt" lists the paths of training images starting with the root-category.
c. "test_images_root.txt" lists the paths of testing images starting with the root-category.
d. "train_images.txt" lists the relative paths of training images starting with the sub-class.
e. "test_images.txt" lists the relative paths of testing images starting with the sub-class.
The "Logo-2K+" dataset is a valuable resource for researchers and developers working on logo classification, as it contains a large and diverse set of logo images with well-defined sub-class labels. The provided training and testing images, along with the label files, can be used to train and evaluate logo classification models.
The statistic comparison of 10 root categories from Logo-2K+ is shown as follows.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1885485%2F9473b590a6a770cf5b3cd42e9b66a13b%2FScreenshot%202023-03-23%20at%208.50.41%20PM.png?generation=1679575860470498&alt=media" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset features 75 different classes of Butterflies. The dataset contains about 1000+ labelled images including the validation images. Each image belongs to only one butterfly category.
The label of each image are saved in Training_set.csv.
The Testing_set.csv contains names of image in test folder, which you need to predict the label and submit to Data Sprint 107 - Butterfly Image Classification.