5 datasets found

Keras-Seedlings
kaggle.com
zip
Updated Nov 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jannes Klaas (2018). Keras-Seedlings [Dataset]. https://www.kaggle.com/datasets/jannesklaas/kerasseedlings/discussion
Explore at:
zip(3395965720 bytes)Available download formats
Dataset updated
Nov 10, 2018
Authors
Jannes Klaas
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

This is a Keras image data generator ready version of the Plant Seedlings Dataset of the Aarhus University Department of Engineering Signal Processing Group.

This Dataset was previously used in a Kaggle Competition but has been re-uploaded to make working with the data in Keras easier

Content

The images presented show weed and crop seedlings. Your task is to classify the type of plant by an image of its seedling. The images have already been segmented, so that each image shows only one plant.

Acknowledgements

Big thanks to Aarhus University Department of Engineering Signal Processing Group for publishing the dataset
BIRDS 20 SPECIES- IMAGE CLASSIFICATION
kaggle.com
zip
Updated Apr 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair shah pirzada (2023). BIRDS 20 SPECIES- IMAGE CLASSIFICATION [Dataset]. https://www.kaggle.com/datasets/umairshahpirzada/birds-20-species-image-classification/discussion
Explore at:
zip(72669555 bytes)Available download formats
Dataset updated
Apr 5, 2023
Authors
Umair shah pirzada
Description
BIRDS 20 SPECIES- IMAGE CLASSIFICATION Data set of 20 bird species. 3208 training images, 100 test images(5 images per species) and 100 validation images(5 images per species. This is a very high quality dataset where there is only one bird in each image and the bird typically takes up at least 50% of the pixels in the image. As a result even a moderately complex model will achieve training and test accuracies in the mid 90% range. Note: all images are original and not created by augmentation All images are 224 X 224 X 3 color images in jpg format. Data set includes a train set, test set and validation set. Each set contains 475 sub directories, one for each bird species. The data structure is convenient if you use the Keras ImageDataGenerator.flow_from_directory to create the train, test and valid data generators. The data set also include a file birds.csv. This cvs file contains 5 columns. The filepaths column contains the relative file path to an image file. The labels column contains the bird species class name associated with the image file. The scientific label column contains the latin scientific name for the image. The data set column denotes which dataset (train, test or valid) the filepath resides in. The class_id column contains the class index value associated with the image file's class. NOTE: The test and validation images in the data set were hand selected to be the "best" images so your model will probably get the highest accuracy score using those data sets versus creating your own test and validation sets. However the latter case is more accurate in terms of model performance on unseen images. Images were gather from internet searches by species name. Once the image files for a species was downloaded they were checked for duplicate images using a python duplicate image detector program I developed. All duplicate images detected were deleted in order to prevent their being images common between the training, test and validation sets. After that the images were cropped so that the bird in most cases occupies at least 50% of the pixel in the image. Then the images were resized to 224 X 224 X3 in jpg format. The cropping ensures that when processed by a CNN their is adequate information in the images to create a highly accurate classifier. Even a moderately robust model should achieve training, validation and test accuracies in the high 90% range. Because of the large size of the dataset I recommend if you try to train a model use and image size of 150 X 150 X 3 in order to reduce training time. All files were also numbered sequential starting from one for each species. So test images are named 1.jpg to 5.jpg. Similarly for validation images. Training images are also numbered sequentially with "zeros" padding. For example 001.jpg, 002.jpg ….010.jpg, 011.jpg …..099.jpg, 100jpg, 102.jpg etc. The zero's padding preserves the file order when used with python file functions and Keras flow from directory. The training set is not balanced, having a varying number of files per species. However each species has at least 130 training image files. One significant shortcoming in the data set is the ratio of male species images to female species images. About 80% of the images are of the male and 20% of the female. Males typical are far more diversely colored while the females of a species are typically bland. Consequently male and female images may look entirely different .Almost all test and validation images are taken from the male of the species. Consequently the classifier may not perform as well on female specie images.
Car vs Bike Classification Dataset
kaggle.com
gts.ai
zip
Updated Oct 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNets (2022). Car vs Bike Classification Dataset [Dataset]. https://www.kaggle.com/datasets/utkarshsaxenadn/car-vs-bike-classification-dataset/code
Explore at:
zip(107824115 bytes)Available download formats
Dataset updated
Oct 28, 2022
Authors
DeepNets
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This data set is a collection of 2,000 Bike and Car images. While collecting these images, It was made sure that all types of bikes and cars are included in the image collection. This is because of the high Intra-variety of cars and bikes. That is, there are different types of cars and bikes, which make it a little tough task for the model because the model will also have to understand the high variety of bikes and cars. But if your model is able to understand the basic structure of a car and a bike, it will be able to distinguish between both classes.

The data is not preprocessed. This is done intentionally so that you can apply the augmentations you want to use. Almost all the 2000 images are unique. So after applying some data augmentation, you can increase the size of the data set.

The data is not distributed into training and validation subsets. But you can easily do so by using an Image data generator from Keras. The preprocessing steps are available in the my notebook associated with this data set. You can practice your computer vision skills using this data set. This is a binary classification task.
Waste Classfication Dataset
kaggle.com
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaan Çerkez (2025). Waste Classfication Dataset [Dataset]. https://www.kaggle.com/datasets/kaanerkez/waste-classfication-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kaan Çerkez
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
Balanced Waste Classification Dataset - E-Waste & Mixed Materials

🎯 Dataset Overview

This dataset contains a comprehensive collection of waste images designed for training machine learning models to classify different types of waste materials, with a strong focus on electronic waste (e-waste) and mixed materials. The dataset includes 7 electronic device categories alongside traditional recyclable materials, making it ideal for modern waste management challenges where electronic devices constitute a significant portion of waste streams. The dataset has been carefully curated and balanced to ensure optimal performance for multi-category waste classification tasks using deep learning approaches.

📊 Dataset Statistics

Total Classes: 17 different waste categories

Images per Class: 400 (balanced)

Total Images: 6,800

Image Format: RGB (3 channels)

Recommended Input Size: 224×224 pixels

Data Structure: Single balanced dataset (not pre-split)

🗂️ Waste Categories

The dataset includes 17 distinct waste categories covering various types of materials commonly found in waste management scenarios:

Battery - Various types of batteries

Cardboard - Cardboard packaging and boxes

Glass - Glass containers and bottles

Keyboard - Computer keyboards and input devices

Metal - Metal cans and metallic waste

Microwave - Microwave ovens and similar appliances

Mobile - Mobile phones and smartphones

Mouse - Computer mice and peripherals

Organic - Biodegradable organic waste

Paper - Paper products and documents

PCB - Printed Circuit Boards (electronic components)

Plastic - Plastic containers and packaging

Player - Media players and entertainment devices

Printer - Printers and printing equipment

Television - TV sets and display devices

Trash - General mixed waste

Washing Machine - Washing machines and large appliances

🛠️ Data Processing Pipeline

1. Data Balancing

Undersampling: Applied to classes with >400 images

Data Augmentation: Applied to classes with <400 images

Target: Exactly 400 images per class for balanced training

2. Data Augmentation Techniques

Rotation: ±20 degrees

Width/Height Shift: ±20%

Shear Range: 20%

Zoom Range: 20%

Horizontal Flip: Enabled

Fill Mode: Nearest neighbor

3. Quality Assurance

Consistent image dimensions

Proper file format validation

Balanced class distribution

Clean data structure

🎯 Recommended Use Cases

Primary Applications

E-Waste Classification: Specialized in electronic devices (Mobile, Keyboard, Mouse, PCB, etc.)

Mixed Waste Sorting: Traditional recyclables (Paper, Plastic, Glass, Metal, Cardboard)

Smart Recycling Systems: Automated waste sorting for both organic and electronic materials

Environmental Monitoring: Multi-category waste identification

Appliance Recycling: Large appliance classification (Microwave, TV, Washing Machine)

Special Features

Electronic Waste Focus: Strong representation of e-waste categories (7 out of 17 classes)

Diverse Material Types: From organic waste to complex electronic devices

Real-world Categories: Practical classification for actual waste management scenarios

Appliance Recognition: Specialized in identifying large household appliances

Model Architectures

Convolutional Neural Networks (CNN)

Transfer Learning with MobileNetV2, ResNet, EfficientNet

Vision Transformers (ViT)

Custom architectures for waste classification

📁 Dataset Structure

balanced_waste_images/ ├── category_1/ │ ├── image_001.jpg │ ├── image_002.jpg │ └── ... (400 images) ├── category_2/ │ ├── image_001.jpg │ └── ... (400 images) └── ... (17 categories total)

Note: Dataset is not pre-split. Users need to create train/validation/test splits as needed.

🚀 Getting Started

Step 1: Data Splitting

Since the dataset is not pre-split, you'll need to create train/validation/test splits:

import splitfolders # Split dataset: 80% train, 10% val, 10% test splitfolders.ratio( input='balanced_waste_images', output='split_data', seed=42, ratio=(.8, .1, .1), group_prefix=None, move=False )

Step 2: Data Loading & Preprocessing

from tensorflow.keras.preprocessing.image import ImageDataGenerator # Data generators with preprocessing train_datagen = ImageDataGenerator(rescale=1./255) val_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( 'split_data/train/', target_size=(224, 224), batch_size=32, class_mode='categorical' ) val_generator = val_datagen.flow_from_director...

QR-images-Augmented

kaggle.com

zip

Updated Jun 26, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Ps-Venom (2022). QR-images-Augmented [Dataset]. https://www.kaggle.com/datasets/psvenom/qrimagesaugmented

Explore at:

zip(151249053 bytes)Available download formats

Dataset updated

Jun 26, 2022

Authors

Ps-Venom

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This is an augmented version of the original QR-code dataset (https://www.kaggle.com/datasets/coledie/qr-codes). The purpose of this set is to simulate real-life data by adding noise, random cropping, shear, and rotation. This would help create more robust Object-detection and Generative models. In addition, the below code would also help with creating your very own Augmented dataset.

import cv2 
import os
datadir = 'qr_dataset' #you'll have to change datadir accordingly
array = []
array_small =[]
from tqdm import tqdm
def create_training_data():
    for img in tqdm(list(os.listdir(datadir))): # iterate over each image per dogs and cats
      try:
        img_array = cv2.imread(datadir+'/'+img ,cv2.IMREAD_COLOR) # convert to array
        new_array = cv2.resize(img_array, (128, 128)) # resize to normalize data size
        array.append([new_array]) 
        array_small.append([cv2.resize(img_array, (32,32),
                interpolation=cv2.INTER_AREA)]) # add this to our training_data
      except Exception as e: # in the interest in keeping the output clean...
        pass
create_training_data()

#augmenting the data
from keras.preprocessing.image import ImageDataGenerator #this generator will save files in a physical format
from skimage import io
datagen = ImageDataGenerator(    
    rotation_range = 40,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    brightness_range = (0.5, 1.5))

for a in X:
 i = 0
 a = a.reshape((1, ) + a.shape)
 for batch in datagen.flow(a, batch_size=1, save_to_dir= 'Augmented-images', save_prefix='dr', save_format='jpeg'):  
 try: #iterate over every image and augment it
  i += 1  
  if i>= 10:
   break 
 except Exception: #in case the image doesn't exist
  print("error")
  pass

Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jannes Klaas (2018). Keras-Seedlings [Dataset]. https://www.kaggle.com/datasets/jannesklaas/kerasseedlings/discussion

Keras-Seedlings

Keras Ready Plants for Your Classifier

Explore at:

zip(3395965720 bytes)Available download formats

Dataset updated

Nov 10, 2018

Authors

Jannes Klaas

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Context

This is a Keras image data generator ready version of the Plant Seedlings Dataset of the Aarhus University Department of Engineering Signal Processing Group.

This Dataset was previously used in a Kaggle Competition but has been re-uploaded to make working with the data in Keras easier

Content

The images presented show weed and crop seedlings. Your task is to classify the type of plant by an image of its seedling. The images have already been segmented, so that each image shows only one plant.

Acknowledgements

Big thanks to Aarhus University Department of Engineering Signal Processing Group for publishing the dataset

Clear search

Close search

Google apps

Main menu

Keras-Seedlings

Context

Content

Acknowledgements

BIRDS 20 SPECIES- IMAGE CLASSIFICATION

Car vs Bike Classification Dataset

Waste Classfication Dataset

Balanced Waste Classification Dataset - E-Waste & Mixed Materials

🎯 Dataset Overview

📊 Dataset Statistics

🗂️ Waste Categories

🛠️ Data Processing Pipeline

1. Data Balancing

2. Data Augmentation Techniques

3. Quality Assurance

🎯 Recommended Use Cases

Primary Applications

Special Features

Model Architectures

📁 Dataset Structure

🚀 Getting Started

Step 1: Data Splitting

Step 2: Data Loading & Preprocessing

QR-images-Augmented

Keras-Seedlings

Keras Ready Plants for Your Classifier

Context

Content

Acknowledgements