Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The original dataset is from https://www.kaggle.com/datasets/andyczhao/covidx-cxr2
The data is separated based on the .txt file (see link) into positive and negative.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255, # Normalize
rotation_range=20, # Rotation reference
zoom_range=0.2, # Zoom reference
width_shift_range=0.2, # wrap
height_shift_range=0.2, # wrap
shear_range=0.2, # Add shear transformation
brightness_range=(0.7, 1.3), # Wider brightness adjustment - reference 0.3
horizontal_flip=True,
fill_mode='nearest'
)
# Counts
current_count = len(os.listdir(input_dir))
target_count = 57199
required_augmented_count = target_count - current_count
print(f"Original negatives: {current_count}")
print(f"Required augmented images: {required_augmented_count}")
# augmenting ...
augmented_count = 0
max_augmentations_per_image = 10 #I used 5 and 10, this dataset was generated with 10
for img_file in os.listdir(input_dir):
img_path = os.path.join(input_dir, img_file)
img = load_img(img_path, target_size=(480, 480)) # 480 by 480 referring to reference.
img_array = img_to_array(img)
img_array = img_array.reshape((1,) + img_array.shape)
# Generate multiple augmentations per image
i = 0
for batch in datagen.flow(
img_array,
batch_size=1,
save_to_dir=output_dir,
save_prefix='aug',
save_format='jpeg'
):
i += 1
augmented_count += 1
if i >= max_augmentations_per_image:
break
if augmented_count >= required_augmented_count:
break
if augmented_count >= required_augmented_count:
break
I tried using different max_augmentations_per_image, or without setting this parameter; both ways generated augmented data (around 9,000) ...
positive_balanced: ```python random.seed(42)
target_count = 20579
all_positive_images = os.listdir(positive_dir) selected_positive_images = random.sample(all_positive_images, target_count) ```
Facebook
TwitterThis data comes entirely from the TensorFlow - Help Protect the Great Barrier Reef competition and should not be used outside of the competition! I do not own these images and to the extent possible want to ensure this complies with terms of the competition - I believe it does. All users/viewers of this dataset should adhere to the terms&conditions of the competition.
I wanted an easily accessible repository of the cots images and not cots images to help with data augmentation and possibly improving the models in other ways. In the spirit of the competition I thought it made the most sense to make this available to the other competitors.
This notebook was used to pre-process / create this dataset: Cropped Crown of Thorns Dataset Builder. It walks through the steps in a readable way.
About the dataset: * This dataset contains an equal number (11,898) of images of COTS and Not Cots .jpg images. * These images come from cropping out the bounding box regions from each video frame in the competition. * Use this for data augmentation * Alternatively, if you're just getting started, try building binary classifiers for COTS vs. Not COTS if you want to build up the skill to create more complicated object detection models.
This comes directly from the TensorFlow - Help Protect the Great Barrier Reef](https://www.kaggle.com/c/tensorflow-great-barrier-reef) competition. Alternative citations include:
Liu, J., Kusy, B., Marchant, R., Do, B., Merz, T., Crosswell, J., ... & Malpani, M. (2021). The CSIRO Crown-of-Thorn Starfish Detection Dataset. arXiv preprint arXiv:2111.14311.
See Notebook used to build this dataset here: Cropped Crown of Thorns Dataset Builder
Facebook
TwitterJasmine, and Karacadag. The images are organized into separate folders for each class, making it suitable for supervised image classification tasks.
Number of classes: 5
Class names: Arborio, Basmati, Ipsala, Jasmine, Karacadag
Image size: 128x128 (resized for modeling)
Total images per class: 15,000
Dataset split:
Training: 70%
Validation: 15%
Test: 15%
The dataset can be used to train convolutional neural networks (CNNs) for rice variety classification. It supports applications in agriculture, food quality control, and AI-powered crop monitoring. Data augmentation techniques have been applied during model training to improve robustness.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There is no story behind this dataset, I just felt that I should also have a dataset ๐ฌ .
The dataset contains top view of dice digits which can be used as an alternative to the MNIST dataset for digit recognition, a benchmark dataset for classification.
The images currently are only 120 and attempts to augment the data have already been made through the Tensorflow data augmentation pipeline, which further increased the dataset to about 1600 images(with random rotations, crops amongst other operations)
For the small dataset that we have here, the images were made from just two dice. The images of the dice are resized to be similar to that of the MNIST dataset for testing results on the already present models.
The images currently in the dataset are named as follows: {number}_{color of the dice**}_{transform angle}_{transformation direction*}
My aim is that the dataset should be big enough so as to not cause overfitting. The dataset should also be diverse enough so that the model for which it is used is accurate.
Albeit augmentation of the dataset is a way to increase the dataset size, original images are preferred for their variability amongst many variables that I might have neglected in my analysis.
*if the direction is necessary, it is mentioned
** Although the images are converted to grayscale, the color of the dice might be feature that is required for some other analysis.
There is no one particularly that comes to mind, because each and every picture in this small dataset was manually edited by me, although I would like to help
The question that I have is whether this dataset can be used for Image Classification ? My take on this problem : GitHub Implementation
Facebook
Twitterhttps://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
ImageNet-Sketch data set consists of 50000 images, 50 images for each of the 1000 ImageNet classes. We construct the data set with Google Image queries "sketch of _", where _ is the standard class name. We only search within the "black and white" color scheme. We initially query 100 images for every class, and then manually clean the pulled images by deleting the irrelevant images and images that are for similar but different classes. For some classes, there are less than 50 images after manually cleaning, and then we augment the data set by flipping and rotating the images.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset originates from a research project investigating microgesture-based text editing in virtual reality (VR). The dataset was collected as part of an evaluation of the MicroGEXT system, which enables precise and efficient text editing using small, subtle hand movements. The research aims to explore lightweight, ergonomic alternatives to traditional mid-air gesture interactions.
Data Collection Methods โข Hardware: The dataset was collected using the Meta Quest Pro VR headset, utilizing its XR Hand Tracking package to capture hand skeleton data at 72 Hz. โข Participants: 10 participants were recruited for gesture elicitation and evaluation. โข Procedure:
Technical & Non-Technical Information for Reusability โข The dataset is suitable for: โข Gesture recognition research (static/dynamic gestures, sub-state segmentation). โข Human-computer interaction (HCI) studies focusing on XR input methods. โข Machine learning applications, including deep learning-based gesture classification. โข Reuse Considerations: โข Compatible with Unityโs XR Hand Tracking package and Python-based deep learning frameworks (e.g., PyTorch, TensorFlow). โข Includes data augmentation scripts for expanding training datasets. โข The Null class helps mitigate false activations in real-time applications.
Facebook
TwitterThis dataset was created to support machine learning research in clothing classification, particularly for smart wardrobe and laundry applications. Inspired by the digital wardrobe concept popularized in media such as Clueless (1995), the dataset contains three primary categories of clothing items: - Tops: t-shirts, button-up shirts, sweaters, hoodies, and other upper garments. - Bottoms: jeans, shorts, formal pants, long trousers, and other lower garments. - Socks: long socks and short socks photographed in pairs and individually.
All images were self-collected using an iPhone camera in HEIC format and later converted to JPG/PNG. Backgrounds were removed manually using Canva and programmatically using Rembg with the Uยฒ-Net model. Augmentation techniques (rotation, flipping, cropping, brightness and contrast adjustments) were applied to increase dataset diversity. - Raw images: 521 (200 tops, 200 bottoms, 121 socks) - Final images after augmentation: ~1,900 (balanced across all classes)
This dataset can be used for experiments in: - Image classification - Data augmentation pipelines - Transfer learning (e.g., Teachable Machine, TensorFlow, PyTorch) - Applied computer vision in smart wardrobe and smart home systems
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project focuses on developing an intelligent system capable of detecting and classifying diseases in plant leaves using image processing and deep learning techniques. Leveraging Convolutional Neural Networks (CNNs) and transfer learning, the system analyzes leaf images to identify signs of infection with high accuracy. It supports smart agriculture by enabling early disease detection, reducing crop loss, and providing actionable insights to farmers. The project uses datasets such as PlantVillage and integrates frameworks like TensorFlow, Keras, and PyTorch. The model can be deployed as a web or mobile application, offering a real-time solution for plant health monitoring in agricultural environments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The WastePro dataset is a comprehensive, custom-curated collection designed for broad-spectrum waste classification using deep learning. It contains high-quality images of solid waste items spanning a wide range of categories, such as organic, plastic, metal, glass, e-waste, paper, cardboard, textiles, rubber, and more. Each image is labeled according to its waste type, enabling robust supervised learning for multi-class classification tasks.
Key Features: - Diverse Categories: WastePro covers 9 distinct waste classes, ensuring representation of both common (organic, plastic, metal, glass) and less common (e-waste, textiles, rubber) waste types. This diversity supports the development of models capable of real-world, context-rich waste recognition. - Image Quality & Structure: Images are RGB and standardized in size (commonly 224x224 pixels) to facilitate compatibility with modern convolutional neural networks. The dataset is organized in a directory structure suitable for direct loading with TensorFlow and Keras utilities. - Data Augmentation Ready: The dataset supports augmentation techniques such as flipping, rotation, zoom, and contrast adjustments, which are essential for increasing model robustness and generalization to unseen waste images. - Real-World Context: Images are collected from multiple sources and environments, including municipal solid waste streams, recycling centers, and public datasets. This ensures that models trained on WastePro are applicable to practical waste management scenarios. Applications: WastePro is ideal for training and benchmarking deep learning models for automated waste sorting, recycling facility automation, smart bins, and environmental monitoring. Its comprehensive coverage and high-quality labeling make it a strong foundation for advancing research and deployment in intelligent waste management systems.
WastePro sets a new standard for waste classification datasets by combining curated intelligence, broad category coverage, and deployment-ready design.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this data-set, 39 different classes of plant leaf and background images are available. The data-set containing 61,486 images. We used six different augmentation techniques for increasing the data-set size. The techniques are image flipping, Gamma correction, noise injection, PCA color augmentation, rotation, and Scaling.
Facebook
TwitterA Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.
Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The vision behind creating this dataset is to have a data set for classifying animal species. A lot of animal species can be included in this data set, which is why it gets revised regularly. This will help to create a machine-learning model that can accurately classify animal species.
This is Animal Classification Data-set made for the Multi-Class Image Recognition Task. The dataset contains 15 Classes, these classes are :
The data is split into 6 directories:
Interesting Data * As the name suggests, this folder contains 5 interesting images per class. These are called Interesting images because it will be fascinating to know which class the model allocates to these shots. Based on the model's prediction, we can understand the model's understanding of that class.
Testing Data * This folder is filled with a random number of images per class. As the name indicates this folder is purposefully made to incorporate testing images, that is images on which the model will be tested after training.
TFRecords Data * This folder contains the data in Tensorflow records format. All the images present in TF records format have already been resized to 256 x 256 pixels and normalized.
Train Augmented * This time, an additional train augmented data is added to the data set. As per the name, this directory contains augmented images per class. 5 augmented images per original image, in total each class has 10,000 augmented images. This is done to increase the data set size because, With the increase in the total number of classes, the model complexity increases. And thus we require more data to train the model. The best way to get more data is data augmentation. It is highly recommended to shuffle the data before/after loading it.
Training Images * Each class is filled with 2000 images for training purpose. This is the data that is used for training the model. In this case, all the images are resized to 256 by 256 pixels and normalized to have the input pixel range of 0 to 1.
Validation Images * This folder contains 100/200 images per class, this is intentionally created for validation purposes. Images from this directory will be used at the time of training for validating the model's performance.
DeepNets
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
instead of having individual patient folder, 2 folder namely 0(Non-IDC) and 1(IDC) have created to contain the images for easy loading to memory or TensorFlow dataset implementations.
Obtained from https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images
Just a advice, use Data augmentation or similar technique since Data is imbalanced. About Dataset Context Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide.
Content The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative and 78,786 IDC positive). Each patchโs file name is of the format: u_xX_yY_classC.png โ > example 10253_idx5_x1351_y1101_class0.png . Where u is the patient ID (10253_idx5), X is the x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.
Acknowledgements The original files are located here: http://gleason.case.edu/webdata/jpi-dl-tutorial/IDC_regular_ps50_idx5.zip Citation: https://www.ncbi.nlm.nih.gov/pubmed/27563488 and http://spie.org/Publications/Proceedings/Paper/10.1117/12.2043872
Inspiration Breast cancer is the most common form of cancer in women, and invasive ductal carcinoma (IDC) is the most common form of breast cancer. Accurately identifying and categorizing breast cancer subtypes is an important clinical task, and automated methods can be used to save time and reduce error.
Facebook
TwitterA big thank you to my GitHub Sponsors for their support!
In addition to the sponsors at the link above, I've received hardware and/or cloud resources from * Nvidia (https://www.nvidia.com/en-us/) * TFRC (https://www.tensorflow.org/tfrc)
I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.
timm bits branch).data, a bit more consistency, unit tests for all!efficientnetv2_rw_t weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
vit_base_patch16_sam_224) and B/32 (vit_base_patch32_sam_224) models.jx_nest_base - 83.534, jx_nest_small - 83.120, jx_nest_tiny - 81.426gmlp_s16_224 trained to 79.6 top-1, matching paper. Hparams for this and other recent MLP training herevit_large_patch16_384 (87.1 top-1), vit_large_r50_s32_384 (86.2 top-1), vit_base_patch16_384 (86.0 top-1)vit_deit_* renamed to just deit_*gmixer_24_224 MLP /w GLU, 78.1 top-1 w/ 25M params.eca_nfnet_l2 weights from my 'lightweight' series. 84.7 top-1 at 384x384.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.
| Class ID | Meaning | Description |
|---|---|---|
| 0 | Yes | Affirmative gesture |
| 1 | No | Negative gesture |
| 2 | I Love You | Expression of affection |
| 3 | Hello | Greeting gesture |
| 4 | Thank You | Gratitude expression |
data_final/
โโโ train/
โ โโโ 0/ # Yes (~150 images)
โ โโโ 1/ # No (~150 images)
โ โโโ 2/ # I Love You (~150 images)
โ โโโ 3/ # Hello (~150 images)
โ โโโ 4/ # Thank You (~150 images)
โโโ val/
โ โโโ 0/
โ โโโ 1/
โ โโโ 2/
โ โโโ 3/
โ โโโ 4/
โโโ test/
โโโ 0/
โโโ 1/
โโโ 2/
โโโ 3/
โโโ 4/
This dataset is suitable for:
Sign Language Recognition Models
Computer Vision Research
Educational Projects
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rescale=1./255)
train_gen = datagen.flow_from_directory(
'data_final/train',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
val_gen = datagen.flow_from_directory(
'data_final/val',
target_size=(224, 224),
batch_size=32,
class_mode='categorical'
)
from torchvision import datasets, transforms
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
train_dataset = datasets.ImageFolder('data_final/train', transform=transform)
val_dataset = datasets.ImageFolder('data_final/val', transform=transform)
Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB
For better generalization, use these augmentation techniques:
python
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=25,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
zoom_range=0.2,
horizontal_flip=True,
brightness_range=[0.7, 1.3]
)
If you use this dataset in your research or project, please cite:
@dataset{sign_language_5phrases_2025,
title={Sign Language Recognition Dataset - 5 Essential Phrases},
author={[Your Name]},
year={2025},
publisher={Kaggle},
url={[Dataset URL]}
}
This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Version 2 extends the version 1 of the fastfood classification data set and introduces some new classes with new images. These new classes are : * Baked Potato * Crispy Chicken * Fries * Taco * Taquito
The data set is divided into 4 parts, the Tensorflow Records, Training DataValidation Data** and Testing Data. The tensorflow records directory is further divided into 3 parts, the Train, Valid and Test. These images are resized to 256 by 256 pixels. No other augmentation is applied. While loading the tensorflow records files, you can apply any augmentation you want.
Train : Contains 15,000 training images, with each class having 1,500 images.
Valid : Contains 3,500 validation images, with each class having 400 images.
Test : Contains 1,500 validation images, with each class having 100/200 images.
Unlike the Tensorflow records data, the Training data, validation data and testing data contains direct images. These are raw images. So any kind of augmentation, and specially resizing, can be applied on them.
Training Data : This directory contains 5 subdirectories. Each directory representing a class. Each class have 1,500 training images.
Validation Data : This directory also contains 10 subdirectories. Each directory representing a class. Each **class have 400 images for monitoring model's performance.
Testing Data : This directory also contains 10 subdirectories. Each directory representing a class. Each **class have 100 /200 images for evaluating model's performance.
This is Fast Food Classification data set containing images of 5 different types of fast food. Each directory represents a class, and each class represents a food type. The Classes are : * Burger * Donut * Hot Dog * Pizza * Sandwich
The data set is divided into 3 parts, the Tensorflow records, Training data set and Validation data set. * The tensorflow records directory is further divided into 2 parts, the training images and the validation images.These images are resized to 256 by 256 pixels. No other augmentation is applied. While loading the tensorflow records files, you can apply any augmentation you want. * Training Images : Contains 7,500 training images, with each class having 1,500 images. * Validation Images : Contains 2,500 validation images, with each class having 500 images.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
FER2013 (Facial Expression Recognition 2013) dataset is a widely used dataset for training and evaluating facial expression recognition models. Here are key details about the FER2013 dataset:
Overview:
FER2013 is a dataset designed for facial expression recognition tasks, particularly the classification of facial expressions into seven different emotion categories. The dataset was introduced for the Emotion Recognition in the Wild (EmotiW) Challenge in 2013.
Emotion Categories:
The dataset consists of images labeled with seven emotion categories: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
Image Size:
Each image in the FER2013 dataset is grayscale and has a resolution of 48x48 pixels.
Number of Images:
The dataset contains a total of 35,887 labeled images, with approximately 5,000 images per emotion category. Partitioning:
FER2013 is often divided into training, validation, and test sets. The original split has 28,709 images for training, 3,589 images for validation, and 3,589 images for testing.
Usage in Research:
FER2013 has been widely used in research for benchmarking and training facial expression recognition models, particularly deep learning models. It provides a standard dataset for evaluating the performance of models on real-world facial expressions. Challenges:
The FER2013 dataset is known for its relatively simple and posed facial expressions. In real-world scenarios, facial expressions can be more complex and spontaneous, and there are datasets addressing these challenges.
Challenges and Criticisms:
Some criticisms of the dataset include its relatively small size, limited diversity in facial expressions, and the fact that some expressions (e.g., "Disgust") are challenging to recognize accurately.
This pre trained machine model implements a Convolutional Neural Network (CNN) for emotion detection using the TensorFlow and Keras frameworks. The model architecture includes convolutional layers, batch normalization, and dropout for effective feature extraction and classification. The training process utilizes an ImageDataGenerator for data augmentation, enhancing the model's ability to generalize to various facial expressions.
Key Steps:
Model Training: The CNN model is trained on an emotion dataset using an ImageDataGenerator for dynamic data augmentation. Training is performed over a specified number of epochs with a reduced batch size for efficient learning.
Model Checkpoint: ModelCheckpoint is employed to save the best-performing model during training, ensuring that the most accurate model is retained.
Save Model and Memory Cleanup: The trained model is saved in both HDF5 and JSON formats. Memory is efficiently managed by deallocating resources, clearing the Keras session, and performing garbage collection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Welcome to the Vehicle Detection Image Dataset! This dataset is meticulously curated for object detection and tracking tasks, with a specific focus on vehicle detection. It serves as a valuable resource for researchers, developers, and enthusiasts seeking to advance the capabilities of computer vision systems.
The primary aim of this dataset is to facilitate precise object detection tasks, particularly in identifying and tracking vehicles within images. Whether you are engaged in academic research, developing commercial applications, or exploring the frontiers of computer vision, this dataset provides a solid foundation for your projects.
Both versions of the dataset undergo essential preprocessing steps, including resizing and orientation adjustments. Additionally, the Apply_Grayscale version undergoes augmentation to introduce grayscale variations, thereby enriching the dataset and improving model robustness.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2F4f23bd8094c892d1b6986c767b42baf4%2Fv2.png?generation=1712264632232641&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2Fbfb10eb2a4db31a62eb4615da824c387%2Fdetails_v1.png?generation=1712264660626280&alt=media" alt="">
To ensure compatibility with a wide range of object detection frameworks and tools, each version of the dataset is available in multiple formats:
These formats facilitate seamless integration into various machine learning frameworks and libraries, empowering users to leverage their preferred development environments.
In addition to image datasets, we also provide a video for real-time object detection evaluation. This video allows users to test the performance of their models in real-world scenarios, providing invaluable insights into the effectiveness of their detection algorithms.
To begin exploring the Vehicle Detection Image Dataset, simply download the version and format that best suits your project requirements. Whether you are an experienced practitioner or just embarking on your journey in computer vision, this dataset offers a valuable resource for advancing your understanding and capabilities in object detection and tracking tasks.
If you utilize this dataset in your work, we kindly request that you cite the following:
Parisa Karimi Darabi. (2024). Vehicle Detection Image Dataset: Suitable for Object Detection and tracking Tasks. Retrieved from https://www.kaggle.com/datasets/pkdarabi/vehicle-detection-image-dataset/
I welcome feedback and contributions from the Kaggle community to continually enhance the quality and usability of this dataset. Please feel free to reach out if you have suggestions, questions, or additional data and annotations to contribute. Together, we can drive innovation and progress in computer vision.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset consists of images of famous (or not-so-famous) landmarks. The collection is organized into a two-level hierarchy structure. The first level is the categories for the landmarks, and the second level is the individual landmarks. There are 6 categories, and categories are: 1. Gothic 2. Modern 3. Mughal 4. Neoclassical 5. Pagodas 6. Pyramids
For each category, there are 5 landmarks, for a total of 30 landmarks. Each landmark has 14 images.
The landmarks dataset is too small to train convolutional neural networks (CNNs) from scratch. The resulting network will overfit the data. Instead, use transfer learning by reusing part of a pre-trained CNN. In transfer learning, instead of training the neural network starting from random weights, the weights for the lower parts of the network are taken from a pre-trained network. Only the higher parts of the network will have to be learned. Chapter 14 of Gรฉron discusses how to apply pre-trained models for transfer learning.
For this group project, the only allowed pre-trained networks are EfficientNetB0 and VGG16, which are smaller CNNs. The objective of this restriction is to avoid penalizing groups that do not have access to powerful machines and/or machines with GPUs. Groups are allowed to use Google Colab with GPUs to train the models, but be aware of resource usage limitations.
Data augmentation is another way to overcome the problem of small datasets. Keras/TensorFlow provides various image manipulation functions (hitps://www.tensorflow.org/api_docs/python/tf/image) that can be used to generate additional images. Refer to Lecture 9 slides and Chapter 14 of Gรฉron.
Yet another way to overcome the small dataset problem is experimenting with various ways of combining the models for the two tasks. It is possible to train two distinct models, one for category classification and one for landmark classification. But would landmark classification benefit from knowing the output of classification? Or vice versa?
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
In this dataset, using a unique dataset (FGVL Dataset) collected from Sultana seedless grape vineyards in the Aegean Region of Turkey, an example segmentation model has been developed to classify frost-damaged leaves and grape clusters at the pixel level. The dataset includes 418 frost-damaged grapes, 510 frost-damaged leaves, 395 healthy grapes, and 698 healthy leaves, collected after a severe frost event in April 2025 at a vineyard in Manisa. The im-ages were captured in high resolution under natural lighting conditions and manually labeled by experts.
Participants must use the FGVL Dataset to develop deep learning models for instance segmentation of frost-damaged and healthy grape leaves and clusters.
You are free to use any image processing or deep learning framework (e.g., YOLOv11, PyTorch, TensorFlow) and apply data augmentation, model tuning, and evaluation techniques.
Submissions will be evaluated based on mAP@50 and mAP@50-95 metrics on the test set.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The original dataset is from https://www.kaggle.com/datasets/andyczhao/covidx-cxr2
The data is separated based on the .txt file (see link) into positive and negative.
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255, # Normalize
rotation_range=20, # Rotation reference
zoom_range=0.2, # Zoom reference
width_shift_range=0.2, # wrap
height_shift_range=0.2, # wrap
shear_range=0.2, # Add shear transformation
brightness_range=(0.7, 1.3), # Wider brightness adjustment - reference 0.3
horizontal_flip=True,
fill_mode='nearest'
)
# Counts
current_count = len(os.listdir(input_dir))
target_count = 57199
required_augmented_count = target_count - current_count
print(f"Original negatives: {current_count}")
print(f"Required augmented images: {required_augmented_count}")
# augmenting ...
augmented_count = 0
max_augmentations_per_image = 10 #I used 5 and 10, this dataset was generated with 10
for img_file in os.listdir(input_dir):
img_path = os.path.join(input_dir, img_file)
img = load_img(img_path, target_size=(480, 480)) # 480 by 480 referring to reference.
img_array = img_to_array(img)
img_array = img_array.reshape((1,) + img_array.shape)
# Generate multiple augmentations per image
i = 0
for batch in datagen.flow(
img_array,
batch_size=1,
save_to_dir=output_dir,
save_prefix='aug',
save_format='jpeg'
):
i += 1
augmented_count += 1
if i >= max_augmentations_per_image:
break
if augmented_count >= required_augmented_count:
break
if augmented_count >= required_augmented_count:
break
I tried using different max_augmentations_per_image, or without setting this parameter; both ways generated augmented data (around 9,000) ...
positive_balanced: ```python random.seed(42)
target_count = 20579
all_positive_images = os.listdir(positive_dir) selected_positive_images = random.sample(all_positive_images, target_count) ```