25 datasets found

Data from: Duck Hunt

kaggle.com

zip

Updated Jul 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt

Explore at:

zip(7379197 bytes)Available download formats

Dataset updated

Jul 26, 2025

Authors

Hugo Zanini

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Duck Hunt Object Detection Dataset

This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

🎯 Dataset Statistics

Metric	Value
Total Images	1,004
Dataset Size	12 MB
Image Format	PNG
Annotation Format	YOLO (.txt)
Classes	4
Train/Val Split	711/260 (73%/27%)

Class Distribution

Class ID	Class Name	Count	Description
0	`dog`	252	The hunting dog in various poses (jumping, laughing, sniffing, etc.)
1	`duck_dead`	256	Dead ducks (both black and red variants)
2	`duck_shot`	248	Ducks in the moment of being shot
3	`duck_flying`	248	Flying ducks in all directions (left, right, diagonal)

📁 Dataset Structure

yolo_dataset_augmented/
├── images/
│  ├── train/      # 711 training images
│  └── val/       # 260 validation images
├── labels/
│  ├── train/      # 711 YOLO annotation files
│  └── val/       # 260 YOLO annotation files
├── classes.txt     # Class names mapping
├── dataset.yaml     # YOLO configuration file
└── augmented_dataset_stats.json # Detailed statistics

🔧 Data Augmentation Details

The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

Augmentation Techniques Applied:

Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
Quality Variations: Gaussian noise, slight blur for robustness
Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

Augmentation Parameters:

{
  'rotation_range': (-15, 15),    # Small rotations for game sprites
  'brightness_range': (0.7, 1.3),  # Brightness variations
  'contrast_range': (0.8, 1.2),   # Contrast adjustments
  'saturation_range': (0.8, 1.2),  # Color saturation
  'noise_intensity': 0.02,      # Gaussian noise
  'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
  'scaling_range': (0.8, 1.2),    # Scale variations
}

🚀 Usage Examples

Loading with YOLOv8 (Ultralytics)

from ultralytics import YOLO

# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)

# Validate
metrics = model.val()

# Predict
results = model('path/to/test/image.png')

Loading with PyTorch

import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os

class DuckHuntDataset(Dataset):
  def _init_(self, images_dir, labels_dir, transform=None):
    self.images_dir = images_dir
    self.labels_dir = labels_dir
    self.transform = transform
    self.images = os.listdir(images_dir)
  
  def _len_(self):
    return len(self.images)
  
  def _getitem_(self, idx):
    img_path = os.path.join(self.images_dir, self.images[idx])
    label_path = os.path.join(self.labels_dir, 
                 self.images[idx].replace('.png', '.txt'))
    
    image = Image.open(img_path)
    # Load YOLO annotations
    with open(label_path, 'r') as f:
      labels = f.readlines()
    
    if self.transform:
      image = self.transform(image)
      
    return image, labels

# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

YOLO Annotation Format

Each .txt file contains one line per object: class_id center_x center_y width height

Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

📊 Technical Specifications

Image Dimensions: Variable (original sprite sizes preserved)
Color Channels: RGB (3 channels)
Annotation Precision: Float32 (normalized coordinates)
File Naming: Descriptive names indicating class and augmentation type
Quality: High-resolution pixel art sprites

🎮 Dataset Context

This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

The Dog: Your hunting companion who retrieves ducks and ...

m
Data from: SalmonScan: A Novel Image Dataset for Machine Learning and Deep...
data.mendeley.com
Updated Mar 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Shoaib Ahmed (2024). SalmonScan: A Novel Image Dataset for Machine Learning and Deep Learning Analysis in Fish Disease Detection in Aquaculture [Dataset]. http://doi.org/10.17632/x3fz2nfm4w.2
Explore at:
Unique identifier
https://doi.org/10.17632/x3fz2nfm4w.2
Dataset updated
Mar 18, 2024
Authors
Md Shoaib Ahmed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SalmonScan dataset is a collection of images of salmon fish, including healthy fish and infected fish. The dataset consists of two classes of images:

Fresh salmon 🐟 Infected Salmon 🐠

This dataset is ideal for various computer vision tasks in machine learning and deep learning applications. Whether you are a researcher, developer, or student, the SalmonScan dataset offers a rich and diverse data source to support your projects and experiments.

So, dive in and explore the fascinating world of salmon health and disease!

The SalmonScan dataset consists of approximately 1,208 images of salmon fish, classified into two classes:

Fresh salmon (healthy fish with no visible signs of disease), 456 images

Infected Salmon containing disease, 752 images

Each class contains a representative and diverse collection of images, capturing a range of different perspectives, scales, and lighting conditions. The images have been carefully curated to ensure that they are of high quality and suitable for use in a variety of computer vision tasks.

Data Preprocessing

The input images were preprocessed to enhance their quality and suitability for further analysis. The following steps were taken:

Resizing 📏: All the images were resized to a uniform size of 600 pixels in width and 250 pixels in height to ensure compatibility with the learning algorithm.

Image Augmentation 📸: To overcome the small amount of images, various image augmentation techniques were applied to the input images. These included:

Horizontal Flip ↩️: The images were horizontally flipped to create additional samples. Vertical Flip ⬆️: The images were vertically flipped to create additional samples. Rotation 🔄: The images were rotated to create additional samples. Cropping 🪓: A portion of the image was randomly cropped to create additional samples. Gaussian Noise 🌌: Gaussian noise was added to the images to create additional samples. Shearing 🌆: The images were sheared to create additional samples. Contrast Adjustment (Gamma) ⚖️: The gamma correction was applied to the images to adjust their contrast. Contrast Adjustment (Sigmoid) ⚖️: The sigmoid function was applied to the images to adjust their contrast.

Usage

To use the salmon scan dataset in your ML and DL projects, follow these steps:

Clone or download the salmon scan dataset repository from GitHub.

Unzip the file to access the two folders (FreshFish and InfectedFish).

Load the images into your preferred programming environment, such as Python.

Use standard libraries such as numpy or pandas to convert the images into arrays, which can be input into a machine learning or deep learning model.

Split the dataset into training, validation, and test sets as per your requirement.

Preprocess the data as needed, such as resizing and normalizing the images.

Train your ML/DL model using the preprocessed training data.

Evaluate the model on the test set and make predictions on new, unseen data.
Waste Classfication Dataset
kaggle.com
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaan Çerkez (2025). Waste Classfication Dataset [Dataset]. https://www.kaggle.com/datasets/kaanerkez/waste-classfication-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kaan Çerkez
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
Balanced Waste Classification Dataset - E-Waste & Mixed Materials

🎯 Dataset Overview

This dataset contains a comprehensive collection of waste images designed for training machine learning models to classify different types of waste materials, with a strong focus on electronic waste (e-waste) and mixed materials. The dataset includes 7 electronic device categories alongside traditional recyclable materials, making it ideal for modern waste management challenges where electronic devices constitute a significant portion of waste streams. The dataset has been carefully curated and balanced to ensure optimal performance for multi-category waste classification tasks using deep learning approaches.

📊 Dataset Statistics

Total Classes: 17 different waste categories

Images per Class: 400 (balanced)

Total Images: 6,800

Image Format: RGB (3 channels)

Recommended Input Size: 224×224 pixels

Data Structure: Single balanced dataset (not pre-split)

🗂️ Waste Categories

The dataset includes 17 distinct waste categories covering various types of materials commonly found in waste management scenarios:

Battery - Various types of batteries

Cardboard - Cardboard packaging and boxes

Glass - Glass containers and bottles

Keyboard - Computer keyboards and input devices

Metal - Metal cans and metallic waste

Microwave - Microwave ovens and similar appliances

Mobile - Mobile phones and smartphones

Mouse - Computer mice and peripherals

Organic - Biodegradable organic waste

Paper - Paper products and documents

PCB - Printed Circuit Boards (electronic components)

Plastic - Plastic containers and packaging

Player - Media players and entertainment devices

Printer - Printers and printing equipment

Television - TV sets and display devices

Trash - General mixed waste

Washing Machine - Washing machines and large appliances

🛠️ Data Processing Pipeline

1. Data Balancing

Undersampling: Applied to classes with >400 images

Data Augmentation: Applied to classes with <400 images

Target: Exactly 400 images per class for balanced training

2. Data Augmentation Techniques

Rotation: ±20 degrees

Width/Height Shift: ±20%

Shear Range: 20%

Zoom Range: 20%

Horizontal Flip: Enabled

Fill Mode: Nearest neighbor

3. Quality Assurance

Consistent image dimensions

Proper file format validation

Balanced class distribution

Clean data structure

🎯 Recommended Use Cases

Primary Applications

E-Waste Classification: Specialized in electronic devices (Mobile, Keyboard, Mouse, PCB, etc.)

Mixed Waste Sorting: Traditional recyclables (Paper, Plastic, Glass, Metal, Cardboard)

Smart Recycling Systems: Automated waste sorting for both organic and electronic materials

Environmental Monitoring: Multi-category waste identification

Appliance Recycling: Large appliance classification (Microwave, TV, Washing Machine)

Special Features

Electronic Waste Focus: Strong representation of e-waste categories (7 out of 17 classes)

Diverse Material Types: From organic waste to complex electronic devices

Real-world Categories: Practical classification for actual waste management scenarios

Appliance Recognition: Specialized in identifying large household appliances

Model Architectures

Convolutional Neural Networks (CNN)

Transfer Learning with MobileNetV2, ResNet, EfficientNet

Vision Transformers (ViT)

Custom architectures for waste classification

📁 Dataset Structure

balanced_waste_images/ ├── category_1/ │ ├── image_001.jpg │ ├── image_002.jpg │ └── ... (400 images) ├── category_2/ │ ├── image_001.jpg │ └── ... (400 images) └── ... (17 categories total)

Note: Dataset is not pre-split. Users need to create train/validation/test splits as needed.

🚀 Getting Started

Step 1: Data Splitting

Since the dataset is not pre-split, you'll need to create train/validation/test splits:

import splitfolders # Split dataset: 80% train, 10% val, 10% test splitfolders.ratio( input='balanced_waste_images', output='split_data', seed=42, ratio=(.8, .1, .1), group_prefix=None, move=False )

Step 2: Data Loading & Preprocessing

from tensorflow.keras.preprocessing.image import ImageDataGenerator # Data generators with preprocessing train_datagen = ImageDataGenerator(rescale=1./255) val_datagen = ImageDataGenerator(rescale=1./255) train_generator = train_datagen.flow_from_directory( 'split_data/train/', target_size=(224, 224), batch_size=32, class_mode='categorical' ) val_generator = val_datagen.flow_from_director...
f
Data from: Bayesian Changepoint Detection via Logistic Regression and the...
tandf.figshare.com
iro.uiowa.edu
application/csv
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew M. Thomas; Michael Jauch; David S. Matteson (2025). Bayesian Changepoint Detection via Logistic Regression and the Topological Analysis of Image Series [Dataset]. http://doi.org/10.6084/m9.figshare.29257379.v2
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29257379.v2
Dataset updated
Jul 17, 2025
Dataset provided by
Taylor & Francis
Authors
Andrew M. Thomas; Michael Jauch; David S. Matteson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a Bayesian method for multivariate changepoint detection that allows for simultaneous inference on the location of a changepoint and the coefficients of a logistic regression model for distinguishing pre-changepoint data from post-changepoint data. In contrast to many methods for multivariate changepoint detection, the proposed method is applicable to data of mixed type and avoids strict assumptions regarding the distribution of the data and the nature of the change. The regression coefficients provide an interpretable description of a potentially complex change. For posterior inference, the model admits a simple Gibbs sampling algorithm based on Pólya-gamma data augmentation. We establish conditions under which the proposed method is guaranteed to recover the true underlying changepoint. As a testing ground for our method, we consider the problem of detecting topological changes in time series of images. We demonstrate that our proposed method BCLR, combined with a topological feature embedding, performs well on both simulated and real image data. The method also successfully recovers the location and nature of changes in more traditional changepoint tasks. An implementation of our method is available in the Python package bclr.
m
BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks...
data.mendeley.com
Updated Oct 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nahid Khan (2025). BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks for BSL Recognition [Dataset]. http://doi.org/10.17632/ms5phkw8sr.1
Explore at:
Unique identifier
https://doi.org/10.17632/ms5phkw8sr.1
Dataset updated
Oct 30, 2025
Authors
Nahid Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides a collection of images and extracted landmark features for 48 fundamental static signs in Bangla Sign Language (BSL), including 38 alphabets and 10 digits (0-9). It was created to support research in isolated sign language recognition (SLR) for BSL and provide a benchmark resource for the research community. In total, the dataset comprises 14,566 raw images, 14,566 mirrored images, and 29,132 processed feature samples.

Data Contents: The dataset is organized into two main folders: 01_Images: Contains 29,132 images in .jpg format (14,566 raw + 14,566 mirrored). • Raw_Images: Contains 14,566 original images collected from participants. • Mirrored_Images: Contains 14,566 horizontally flipped versions of the raw images for data augmentation purposes. • Privacy Note: Facial regions in all images within this folder have been anonymized (blurred) to protect participant privacy, as formal
informed consent for sharing identifiable images was not obtained prior to collection.

02_Processed_Features_NPY: Contains 29,132 126-dimensional hand landmark features saved as NumPy arrays in .npy format. Features were extracted using MediaPipe Holistic (capturing 21 landmarks each for the left and right hands, resulting in 63 + 63 = 126 features per image). These feature files are pre-split into train (23,293 samples), val (2,911 samples), and test (2,928 samples) subdirectories (approximately 80%/10%/10%) for standardized model evaluation and benchmarking .

Data Collection: Images were collected from 5 volunteers using a Macbook Air M3 camera. Data collection took place indoors under room lighting conditions against a white background. Images were captured manually using a Python script to ensure clarity.

Potential Use: Researchers can utilize the anonymized raw and mirrored images (01_Images) to develop or test novel feature extraction techniques or multimodal recognition systems. Alternatively, the pre-processed and split .npy feature files (02_Processed_Features_NPY) can be directly used to efficiently train and evaluate machine learning models for static BSL recognition, facilitating reproducible research and benchmarking.

Further Details: Please refer to the README.md file included within the dataset for detailed class mapping (e.g., L1='অ', D0='০'), comprehensive file statistics per class , specifics on the data processing pipeline, and citation guidelines.
Sign Language Dataset - 5 Essential Phrases
kaggle.com
zip
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Hamdey (2025). Sign Language Dataset - 5 Essential Phrases [Dataset]. https://www.kaggle.com/datasets/mohamedhamdey/5-basic-signes
Explore at:
zip(22115208 bytes)Available download formats
Dataset updated
Oct 25, 2025
Authors
Mohamed Hamdey
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Sign Language Recognition Dataset - 5 Essential Phrases

🎯 Overview

This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.

📊 Dataset Statistics

Total Images: ~1,000 images

Number of Classes: 5

Image Format: JPG

Image Size: 224×224 pixels (standardized)

Split: 75% Train / 15% Validation / 10% Test

🏷️ Classes

Class ID Meaning Description
0 Yes Affirmative gesture
1 No Negative gesture
2 I Love You Expression of affection
3 Hello Greeting gesture
4 Thank You Gratitude expression

📂 Dataset Structure

data_final/ ├── train/ │ ├── 0/ # Yes (~150 images) │ ├── 1/ # No (~150 images) │ ├── 2/ # I Love You (~150 images) │ ├── 3/ # Hello (~150 images) │ └── 4/ # Thank You (~150 images) ├── val/ │ ├── 0/ │ ├── 1/ │ ├── 2/ │ ├── 3/ │ └── 4/ └── test/ ├── 0/ ├── 1/ ├── 2/ ├── 3/ └── 4/

🎨 Data Collection & Preprocessing

Collection Process:

Images collected using webcam in controlled environment

Hand gestures detected using MediaPipe hand tracking

Multiple angles, positions, and lighting conditions

Various hand positions and distances from camera

Preprocessing:

Hand region detection using MediaPipe

Automatic cropping to hand bounding box

Resized to 224×224 pixels

Padding added around hand region

Quality control and manual cleaning performed

🔧 Image Characteristics

Resolution: 224×224 pixels

Color: RGB

Background: Various (natural backgrounds)

Lighting: Mixed (natural and artificial)

Hand Orientation: Multiple angles

Distance: Varied (close, medium, far)

💡 Use Cases

This dataset is suitable for:

Sign Language Recognition Models

Real-time gesture recognition

Sign-to-speech applications

Accessibility tools

Computer Vision Research

Hand gesture classification

Transfer learning experiments

Mobile ML applications

Educational Projects

Learning deep learning basics

Building gesture recognition systems

Prototyping accessibility solutions

🚀 Quick Start

Load Data with TensorFlow:

from tensorflow.keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator(rescale=1./255) train_gen = datagen.flow_from_directory( 'data_final/train', target_size=(224, 224), batch_size=32, class_mode='categorical' ) val_gen = datagen.flow_from_directory( 'data_final/val', target_size=(224, 224), batch_size=32, class_mode='categorical' )

Load Data with PyTorch:

from torchvision import datasets, transforms transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]) train_dataset = datasets.ImageFolder('data_final/train', transform=transform) val_dataset = datasets.ImageFolder('data_final/val', transform=transform)

📈 Baseline Performance

Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB

🎓 Recommended Augmentation

For better generalization, use these augmentation techniques: python train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=25, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, zoom_range=0.2, horizontal_flip=True, brightness_range=[0.7, 1.3] )

⚠️ Limitations

Limited vocabulary: Only 5 signs (not comprehensive)

Single person: Images from one individual (limited diversity)

Static gestures: No motion-based signs

Controlled environment: May need adaptation for real-world scenarios

Hand dominance: Mix of left and right hands

🔮 Future Improvements

Expand to 20+ common signs

Include multiple signers (diverse skin tones, ages, genders)

Add motion-based gestures (video data)

Regional sign language variations

More challenging backgrounds

📜 Citation

If you use this dataset in your research or project, please cite: @dataset{sign_language_5phrases_2025, title={Sign Language Recognition Dataset - 5 Essential Phrases}, author={[Your Name]}, year={2025}, publisher={Kaggle}, url={[Dataset URL]} }

📄 License

This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)

🤝 Acknowledgments

MediaPipe by Google for hand tracking

TensorFlow/Keras for deep learning fr...
Z
BIRD: Big Impulse Response Dataset
data.niaid.nih.gov
kaggle.com
Updated Oct 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François (2020). BIRD: Big Impulse Response Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4139415
Explore at:
Dataset updated
Oct 29, 2020
Dataset provided by
Mila - Université de Montréal
Université de Sherbrooke
Authors
Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
Data from: BioEncoder: a metric learning toolkit for comparative organismal...
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moritz David Lürig; Moritz David Lürig; Emanuela Di Martino; Emanuela Di Martino; Arthur Porto; Arthur Porto (2024). Data from: BioEncoder: a metric learning toolkit for comparative organismal biology [Dataset]. http://doi.org/10.5281/zenodo.13017212
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13017212
Dataset updated
Jul 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Moritz David Lürig; Moritz David Lürig; Emanuela Di Martino; Emanuela Di Martino; Arthur Porto; Arthur Porto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 28, 2024
Description
BioEncoder: a metric learning toolkit for comparative organismal biology

Abstract - In the realm of biological image analysis, deep learning (DL) has become a core toolkit, e.g., for segmentation and classification. However, conventional DL methods are challenged by large biodiversity datasets characterized by unbalanced classes and hard-to-distinguish phenotypic differences between them. Here we present BioEncoder, a user-friendly toolkit for metric learning, which overcomes these challenges by focussing on learning relationships between individual data points rather than on the separability of classes. BioEncoder is released as a Python package, created for ease of use and flexibility across diverse datasets. It features taxon-agnostic data loaders, custom augmentation options, and simple hyperparameter adjustments through text-based configuration files. The toolkit's significance lies in its potential to unlock new research avenues in biological image analysis while democratizing access to advanced deep metric learning techniques. BioEncoder focuses on the urgent need for toolkits bridging the gap between complex DL pipelines and practical applications in biological research.

Dataset - This data repository includes two things: a snapshot of the BioEncoder package (BioEncoder-main.zip, version 1.0.0, downloaded from https://github.com/agporto/BioEncoder on 2024-07-19 at 17:20), and the damselfly dataset used for the case study presented in the paper (bioencoder_data.zip). The dataset archive also encompasses the configuration files and the final model checkpoints from the case study, as well as a script to reproduce the results and figures presented in the paper.

How to use - Get started by consulting the GithHub repository for information on how to install BioEncoder, then download the data archive and run the script. Some parts of the script can be executed using the model checkpoints, for orther parts the training rountine needs to be run.
Cars and Bikes Images
kaggle.com
zip
Updated Nov 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suresh Maheshwari (2025). Cars and Bikes Images [Dataset]. https://www.kaggle.com/datasets/sureshmaheshwari021/cars-and-bikes-images
Explore at:
zip(48592401 bytes)Available download formats
Dataset updated
Nov 6, 2025
Authors
Suresh Maheshwari
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a collection of car and bike images scraped from the web using the Bing Image Crawler (icrawler library in Python). It was created for educational and research purposes, especially for projects involving computer vision, deep learning, and image classification.

Each image was retrieved from publicly available Bing search results and organized into two folders:

cars/ — contains images of different types and models of cars

bikes/ — contains images of various motorcycles and scooters

Usage

This dataset is suitable for:

Training and testing CNNs or transfer learning models (e.g., ResNet, VGG, EfficientNet)

Practicing image preprocessing and augmentation techniques

Developing vehicle recognition or classification systems

Data Collection

Images were automatically collected using:

from icrawler.builtin import BingImageCrawler

with filters={'type': 'photo'} to ensure only photographic content.

License

All images are shared under the CC0: Public Domain license. They are intended solely for non-commercial, academic, and research use.

Class ID	Meaning	Description
0	Yes	Affirmative gesture
1	No	Negative gesture
2	I Love You	Expression of affection
3	Hello	Greeting gesture
4	Thank You	Gratitude expression

Breast Cancer Dataset

kaggle.com

zip

Updated Nov 8, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Djaid Walid (2025). Breast Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/djaidwalid/breast-cancer-dataset

Explore at:

zip(2527354850 bytes)Available download formats

Dataset updated

Nov 8, 2025

Authors

Djaid Walid

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Dataset structure:

This dataset is divided into two main directories train and test directory each divided into two other directories breast_malignant and breast_benign following this structure: ``` \train | |_\breast_malignant | |_4000 images | |_\breast_benign | |_4000 images

# Dataset details:

|Path| Subclass|Description|
|-----|-----------|--------------|
|breast_benign| Benign| Non-cancerous breast tissues|
|breast_malignant| Malignant| TCancerous breast tissues|

*Source: Collected from the Breast Cancer dataset by Anas Elmasry on Kaggle.*

# Data augmentation:
the data was augmentend by the original author of the dataset using Kera's `ImageDataGeberator` *[1]* and The augmentations include:
- Rotation: Up to 10 degrees.
-  Width & Height Shift: Up to 10% of the total image size.
-  Shearing & Zooming: 10% variation.
-  Horizontal Flip: Randomly flips images for additional diversity.
-  Brightness Adjustment: Ranges from 0.2 to 1.2 for varying light conditions.

The parameters used for augmentation:
```python
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
  rotation_range=10,
  width_shift_range=0.1,
  height_shift_range=0.1,
  shear_range=0.1,
  zoom_range=0.1,
  horizontal_flip=True,
  fill_mode='nearest',
  brightness_range=[0.2, 1.2]
)

n
Data from: A convolutional neural network to identify mosquito species...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Feb 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Felix Sauer; Moritz Werny; Kristopher Nolte; Carmen Villacañas de Castro; Norbert Becker; Ellen Kiel; Renke Lühken (2024). A convolutional neural network to identify mosquito species (Diptera: Culicidae) of the genus Aedes by wing images [Dataset]. http://doi.org/10.5061/dryad.vx0k6djz9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vx0k6djz9
Dataset updated
Feb 18, 2024
Dataset provided by
Heidelberg University
Carl von Ossietzky Universität Oldenburg
Bernhard Nocht Institute for Tropical Medicine
The Deep Bench
Authors
Felix Sauer; Moritz Werny; Kristopher Nolte; Carmen Villacañas de Castro; Norbert Becker; Ellen Kiel; Renke Lühken
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Accurate species identification is a prerequisite to assess the medical relevance of a mosquito specimens. In monitoring or surveillance programs, mosquitoes are typically identified based on morphological characters, which can be supported by molecular biological assays. Both methods require intensive experience of the observers and well-equipped laboratories. The use of convolutional neural networks (CNNs) to identify species based on images may be a cost-effective and reliable alternative. In this proof-of-concept study, we developed a CNN to identify seven Aedes species by wing images, only. While previous studies used images of the whole mosquito body, the nearly two-dimensional wings may facilitate standardized image capture and thereby reduce the complexity of the CNN implementation. Mosquitoes were sampled from different sites in Germany. Their wings were mounted and photographed with a professional stereomicroscope. The data set consisted of 1,155 wing images from seven Aedes species, including the exotic species Aedes albopictus und six native Aedes species, as well as 554 wings from different non-Aedes mosquitoes. The wing images were used to train a CNN to differentiate between Aedes and non-Aedes mosquitoes and to classify the seven Aedes species. The training was conducted separately for grayscale and RGB images. Image processing, data augmentation, training, validation and testing were conducted in python using deep-learning framework PyTorch. For both input images, i.e. grayscale and RGB images, our best-performing CNN configuration achieved an accuracy of 100% to discriminate Aedes from non-Aedes mosquito species. The accuracy to predict the Aedes species reached 93% for grayscale images and 96% for RGB images. Aedes albopictus could be identified with an accuracy of 100%. In conclusion, wing images are sufficient to identify mosquito species by CNN based image classification. Thus, wing images can represent a useful complement for CNN-based image classification, e.g. for damaged mosquito specimens. Larger training data sets with further mosquito species and a greater variety of images are required to improve and test broad applicability. Methods The study was based on 1,155 wing photos from female Aedes specimens, including 165 Ae. albopictus, 165 Ae. cinereus, 165 Ae. communis, 165 Ae. punctor, 165 Ae. rusticus, 165 Ae. sticticus and 165 Ae. vexans. As unknown-class we integrated further 554 wing photos from common non-Aedes mosquito species in Germany, including 61 Anopheles claviger (Meigen, 1804), 196 Anopheles maculipennis s.l., 11 Anopheles plumbeus Stephens, 1828, 214 Culex pipiens s.s./Cx. torrentium and 72 Coquillettidia richiardii (Ficalbi, 1889). The field-sampled mosquitoes were directly killed and stored at -20 °C until further preparation. All specimens were identified by morphology. After the morphological species identification, the right wing of each specimen was removed and mounted with euparal (Carl Roth, Karlsruhe, Germany) on microscopic slides. Subsequently, the mounted wings were photographed with a stereomicroscope (Leica M205 C, Leica Microsystems, Wetzlar, Germany) under 20× magnification using standardized illumination under and exposure time (279 ms).
S
NASICON-type solid electrolyte materials named entity recognition dataset
scidb.cn
Updated Apr 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi (2023). NASICON-type solid electrolyte materials named entity recognition dataset [Dataset]. http://doi.org/10.57760/sciencedb.j00213.00001
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.j00213.00001
Dataset updated
Apr 27, 2023
Dataset provided by
Science Data Bank
Authors
Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi
Description
1.Framework overview. This paper proposed a pipeline to construct high-quality datasets for text mining in materials science. Firstly, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is performed to generate high-quality pre-annotated corpora conditioned on the characteristics of materials texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating materials domain knowledge (cDA-DK) is constructed to augment the data quantity.2.Dataset information. The experimental datasets used in this paper include: the Matscholar dataset publicly published by Weston et al. (DOI: 10.1021/acs.jcim.9b00470), and the NASICON entity recognition dataset constructed by ourselves. Herein, we mainly introduce the details of NASICON entity recognition dataset.2.1 Data collection and preprocessing. Firstly, 55 materials science literature related to NASICON system are collected through Crystallographic Information File (CIF), which contains a wealth of structure-activity relationship information. Note that materials science literature is mostly stored as portable document format (PDF), with content arranged in columns and mixed with tables, images, and formulas, which significantly compromises the readability of the text sequence. To tackle this issue, we employ the text parser PDFMiner (a Python toolkit) to standardize, segment, and parse the original documents, thereby converting PDF literature into plain text. In this process, the entire textual information of literature, encompassing title, author, abstract, keywords, institution, publisher, and publication year, is retained and stored as a unified TXT document. Subsequently, we apply rules based on Python regular expressions to remove redundant information, such as garbled characters and line breaks caused by figures, tables, and formulas. This results in a cleaner text corpus, enhancing its readability and enabling more efficient data analysis. Note that special symbols may also appear as garbled characters, but we refrain from directly deleting them, as they may contain valuable information such as chemical units. Therefore, we converted all such symbols to a special token

Kidney Cancer Dataset

kaggle.com

zip

Updated Nov 6, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Djaid Walid (2025). Kidney Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/djaidwalid/kidney-cancer-dataset

Explore at:

zip(398877314 bytes)Available download formats

Dataset updated

Nov 6, 2025

Authors

Djaid Walid

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The Kidney Cancer Dataset is a subdataset of Multi Cancer Dataset

Dataset structure:

the data was split into two main directories trainand test following this structure:

\train
 |
 |_\kidney_normal
   |
   |_4000 images
 |
 |_\kidney_tumor
   |
   |_4000 images

\test
 |
 |_\kidney_normal
   |
   |_1000 images
 |
 |_\kidney_tumor
   | 
   |_1000 images

Citations:

You can cite this dataset as: https://www.kaggle.com/datasets/djaidwalid/kidney-cancer-dataset/data

Dataset Details: [1]

The original dataset structure [1]:

Cancer Type	Classes	Images
Acute Lymphoblastic Leukemia	4	20,000
Brain Cancer	3	15,000
Breast Cancer	2	10,000
Cervical Cancer	5	25,000
Kidney Cancer	2	10,000
Lung and Colon Cancer	5	25,000
Lymphoma	3	15,000
Oral Cancer	2	10,000

I selected for this model only the Kidney Cancer directory.

Data augmentation

the data was augmentend by the original author of the dataset using Kera's ImageDataGeberator [1] and The augmentations include: - Rotation: Up to 10 degrees. - Width & Height Shift: Up to 10% of the total image size. - Shearing & Zooming: 10% variation. - Horizontal Flip: Randomly flips images for additional diversity. - Brightness Adjustment: Ranges from 0.2 to 1.2 for varying light conditions.

The parameters used for augmentation: ```python from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, fill_mode='nearest', brightness_range=[0.2, 1.2] )

# ***References:***

*[1]* Obuli Sai Naren. (2022). Multi Cancer Dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/3415848

m
Neural Networks in Friction Factor Analysis of Smooth Pipe Bends
data.mendeley.com
Updated Dec 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adarsh Vasa (2022). Neural Networks in Friction Factor Analysis of Smooth Pipe Bends [Dataset]. http://doi.org/10.17632/sjvbwh5ckg.1
Explore at:
Unique identifier
https://doi.org/10.17632/sjvbwh5ckg.1
Dataset updated
Dec 19, 2022
Authors
Adarsh Vasa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PROGRAM SUMMARY No. of lines in distributed program, including test data, etc.: 481 No. of bytes in distributed program, including test data, etc.: 14540.8 Distribution format: .py, .csv Programming language: Python Computer: Any workstation or laptop computer running TensorFlow, Google Colab, Anaconda, Jupyter, pandas, NumPy, Microsoft Azure and Alteryx. Operating system: Windows and Mac OS, Linux.

Nature of problem: Navier-Stokes equations are solved numerically in ANSYS Fluent using Reynolds stress model for turbulence. The simulated values of friction factor are validated with theoretical and experimental data obtained from literature. Artificial neural networks are then used for a prediction-based augmentation of friction factor. The capabilities of the neural networks is discussed, in regard to computational cost and domain limitations.

Solution method: The simulation data is obtained through Reynolds stress modelling of fluid flow through pipe. This data is augmented using the artificial neural network model that predicts within and without data domain.

Restrictions: The code used in this research is limited to smooth pipe bends, in which friction factor is analysed using a steady state incompressible fluid flow.

Runtime: The artificial neural network produces results within a span of 20 seconds for three-dimensional geometry, using the allocated free computational resources of Google Colaboratory cloud-based computing system.
c
Data supporting "Evaluating the Usability of Microgestures for Text Editing...
repository.cam.ac.uk
zip
Updated Sep 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, Xiang; He, Wei; Kristensson, Per Ola (2025). Data supporting "Evaluating the Usability of Microgestures for Text Editing Tasks in Virtual Reality" [Dataset]. http://doi.org/10.17863/CAM.115757
Explore at:
zip(37833332 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.115757
Dataset updated
Sep 22, 2025
Dataset provided by
Apollo
University of Cambridge
Authors
Li, Xiang; He, Wei; Kristensson, Per Ola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Origin of the Dataset

This dataset originates from a research project investigating microgesture-based text editing in virtual reality (VR). The dataset was collected as part of an evaluation of the MicroGEXT system, which enables precise and efficient text editing using small, subtle hand movements. The research aims to explore lightweight, ergonomic alternatives to traditional mid-air gesture interactions.

Data Collection Methods • Hardware: The dataset was collected using the Meta Quest Pro VR headset, utilizing its XR Hand Tracking package to capture hand skeleton data at 72 Hz. • Participants: 10 participants were recruited for gesture elicitation and evaluation. • Procedure:

Participants interacted with a VR text-editing application that mapped microgestures to common editing functions.

Before data collection, participants viewed a demonstration video to understand each gesture.

Each participant performed each gesture 20 times to ensure data consistency.

Static gestures were clipped to 2 seconds, while dynamic gestures were recorded in 5-second clips to capture complete motion sequences.

Swipe gestures were segmented into sub-states (0–3) for granular phase analysis, with each frame assigned a sub-state label.

Technical & Non-Technical Information for Reusability • The dataset is suitable for: • Gesture recognition research (static/dynamic gestures, sub-state segmentation). • Human-computer interaction (HCI) studies focusing on XR input methods. • Machine learning applications, including deep learning-based gesture classification. • Reuse Considerations: • Compatible with Unity’s XR Hand Tracking package and Python-based deep learning frameworks (e.g., PyTorch, TensorFlow). • Includes data augmentation scripts for expanding training datasets. • The Null class helps mitigate false activations in real-time applications.
DIGITAL SINDHI SCRIPT
kaggle.com
zip
Updated Oct 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shayan Ali Shaikh (2025). DIGITAL SINDHI SCRIPT [Dataset]. https://www.kaggle.com/datasets/shayanalishaikh/digital-sindhi-script
Explore at:
zip(29387394 bytes)Available download formats
Dataset updated
Oct 25, 2025
Authors
Shayan Ali Shaikh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About Dataset

🧠 Overview The Sindhi Handwritten Alphabet Dataset is a comprehensive collection of handwritten Sindhi alphabet images, developed to support research in handwriting recognition, OCR, and computer vision for regional scripts. This dataset emphasizes diversity, authenticity, and real-world handwriting variations, making it highly suitable for AI-based Sindhi character recognition systems.

🧾 Dataset Summary

Total Sindhi Letters: 52

Images per Letter: Approximately 460

Total Images: Around 24,000

Image Format: JPG

Data Source: Handwritten samples collected from students in 45 schools across Sindh (Classes 3 to 7)

Data Augmentation: Advanced augmentation techniques applied using OpenCV and other preprocessing tools

Annotation Tools: RoboFlow used for labeling and data management

Diversity & Realism The dataset captures handwriting from contributors across multiple generations and genders, reflecting a wide range of writing styles and characteristics. Generations Covered: Gen X Millennials Gen Z Gen Alpha

Writing Styles: Cursive Bold Thin Uneven and natural strokes

This diversity ensures models trained on this dataset can generalize well to unseen handwriting and different personal writing habits.

🎯 Usage & Applications

This dataset is ideal for:

Optical Character Recognition (OCR) for Sindhi script Handwritten Character Classification Handwriting Style Analysis across generations AI-based Sindhi Language Digitization & Preservation Computer Vision Research for regional and low-resource languages

Model Development A ResNet-50 based deep learning model was trained on this dataset with four additional layers and strong augmentation strategies.

Model Performance: Training Accuracy: 97% Validation Accuracy: 98% Testing Accuracy: 92%

These results demonstrate the dataset’s effectiveness for developing high-performing Sindhi alphabet recognition models.

👩‍💻 Development Team Shayan Ali Shaikh Muhammad Hamza Under the Supervision of: 🎓 Dr. Attaullah Sahito

Data Collection: Handwriting samples were collected from students in 45 schools across Sindh, covering Classes 3–7. This ensures authentic, naturally varied handwriting samples representing real-world conditions.

🧩 Technical Notes

Tools Used: OpenCV, RoboFlow, Python (for preprocessing and manipulation)

Augmentation Techniques: Rotation, noise addition, brightness/contrast variation, and blurring for robustness

📜 License & Attribution

This dataset is released under the Creative Commons CC BY 4.0 License, allowing free use, sharing, and modification with proper attribution.

Contributors: Shayan Ali Shaikh Muhammad Hamza

💡 Acknowledgment

This dataset was developed to digitize and preserve the Sindhi language through AI. We encourage students, researchers, and developers to use this dataset to advance Sindhi handwriting recognition and OCR technologies. Note It is first dataset ever written by students from primary level to secondary level and written on real pages with pen. There is no any computerized mainpulation.
Landmarks Dataset
kaggle.com
zip
Updated Apr 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kayvan Shah (2023). Landmarks Dataset [Dataset]. https://www.kaggle.com/datasets/kayvanshah/landmarks-dataset
Explore at:
zip(191065672 bytes)Available download formats
Dataset updated
Apr 23, 2023
Authors
Kayvan Shah
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The dataset consists of images of famous (or not-so-famous) landmarks. The collection is organized into a two-level hierarchy structure. The first level is the categories for the landmarks, and the second level is the individual landmarks. There are 6 categories, and categories are: 1. Gothic 2. Modern 3. Mughal 4. Neoclassical 5. Pagodas 6. Pyramids

For each category, there are 5 landmarks, for a total of 30 landmarks. Each landmark has 14 images.

Tasks:

This group project is comprised of two machine-learning tasks:

Category classification: predict the category names of images

Landmark classification: predict the landmark names of images

The landmarks dataset is too small to train convolutional neural networks (CNNs) from scratch. The resulting network will overfit the data. Instead, use transfer learning by reusing part of a pre-trained CNN. In transfer learning, instead of training the neural network starting from random weights, the weights for the lower parts of the network are taken from a pre-trained network. Only the higher parts of the network will have to be learned. Chapter 14 of Géron discusses how to apply pre-trained models for transfer learning.

For this group project, the only allowed pre-trained networks are EfficientNetB0 and VGG16, which are smaller CNNs. The objective of this restriction is to avoid penalizing groups that do not have access to powerful machines and/or machines with GPUs. Groups are allowed to use Google Colab with GPUs to train the models, but be aware of resource usage limitations.

Data augmentation is another way to overcome the problem of small datasets. Keras/TensorFlow provides various image manipulation functions (hitps://www.tensorflow.org/api_docs/python/tf/image) that can be used to generate additional images. Refer to Lecture 9 slides and Chapter 14 of Géron.

Yet another way to overcome the small dataset problem is experimenting with various ways of combining the models for the two tasks. It is possible to train two distinct models, one for category classification and one for landmark classification. But would landmark classification benefit from knowing the output of classification? Or vice versa?

Code and Model Submission:

The details of the submission will be provided later. We are in the process of setting up a Vocareum site that will allow you to run your model against part of the holdout test images.

You are strongly encouraged to use Keras/TensorFlow.
Virtual Influencer Cultural Content Dataset
kaggle.com
zip
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Python Developer (2025). Virtual Influencer Cultural Content Dataset [Dataset]. https://www.kaggle.com/programmer3/virtual-influencer-cultural-content-dataset
Explore at:
zip(18804 bytes)Available download formats
Dataset updated
Jun 12, 2025
Authors
Python Developer
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 1,023 multimedia data created by a virtual influencer focused on cultural heritage, featuring a diverse range of content including images, videos, and texts representing historical landmarks, traditional attire, cultural festivals, and architectural symbols. The data was collected through content generation, real-world cultural representations, and preprocessed with techniques such as resizing, normalization, and data augmentation to ensure consistency and diversity. The dataset is extracted and flattened into a CSV file format. From the location in Shanghai, China.
SpaceNet: A Comprehensive Astronomical Dataset
kaggle.com
zip
Updated Aug 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raza Imam (2024). SpaceNet: A Comprehensive Astronomical Dataset [Dataset]. https://www.kaggle.com/datasets/razaimam45/spacenet-an-optimally-distributed-astronomy-data
Explore at:
zip(56552989870 bytes)Available download formats
Dataset updated
Aug 30, 2024
Authors
Raza Imam
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description:

SpaceNet, attained via a novel double-stage augmentation framework: FLARE https://arxiv.org/pdf/2405.13267, is a hierarchically structured and high-quality astronomical image dataset designed for fine-grained and macro classification tasks. Comprising approximately 12,900 samples, SpaceNet integrates lower (LR) to higher resolution (HR) conversion with standard augmentations and a diffusion approach for synthetic sample generation. This dataset enables superior generalization on various recogntion tasks like classification.

Dataset Structure

Fine-Grained Classes: 8 classes including planets, galaxies, asteroids, nebulae, comets, black holes, stars, and constellations.

Dataset Composition

Total Samples: Approximately 12,900 images. Fine-Grained Class Distribution: - Asteroid: 283 files - Black Hole: 656 files - Comet: 416 files - Constellation: 1,552 files - Galaxy: 3,984 files - Nebula: 1,192 files - Planet: 1,472 files - Star: 3,269 files

Usage

SpaceNet is suitable for:

Training and evaluating machine learning models on fine-grained and macro astronomical classification tasks.

Research on hierarchical classification approaches in the astronomy domain.

Developing robust models that generalize well across in-domain and out-of-domain datasets.

Citation

If you use SpaceNet in your research, please cite it as follows: python @misc{alamimam2024flare, title={FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging}, author={Mohammed Talha Alam and Raza Imam and Mohsen Guizani and Fakhri Karray}, year={2024}, eprint={2405.13267}, archivePrefix={arXiv}, primaryClass={cs.CV} }
NEW THAI CURRENCY NOTE DATASET
kaggle.com
zip
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irfan Ahmad (2024). NEW THAI CURRENCY NOTE DATASET [Dataset]. https://www.kaggle.com/datasets/irfanahmad1/new-thai-currency-note-dataset/data
Explore at:
zip(10749845451 bytes)Available download formats
Dataset updated
Aug 2, 2024
Authors
Irfan Ahmad
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
A diverse dataset is crucial for training deep learning models, especially in the context of currency note recognition. Factors such as diverse backgrounds, lighting, orientation, and blur can significantly impact model outcomes. While high-quality scans of different currencies are accessible on collectors' websites, these often lack the variety seen in real-world scenarios. Additionally, publicly available datasets, which primarily feature old Thai currency notes, are limited, containing up to only 1000 images.

Recognizing the scarcity of comprehensive datasets for new Thai currency notes, we curated a collection of 3,600 images spanning five denominations: - 20 baht - 50 baht - 100 baht - 500 baht - 1000 baht

These images depict the notes in various orientations and settings, including different backgrounds and lighting conditions, such as illuminated and dark environments.

We used two iPhone models to capture this diversity: - iPhone 13 Pro Max (12-megapixel, f/1.8 rear camera) - iPhone 12 (12-megapixel, f/1.6 rear camera)

Unique scenarios were also included, such as half-folded notes against contrasting backgrounds. For consistency, the iPhone 12 captured 4032×3024 resolution shots of the 50 and 1000 baht notes, while the iPhone 13 Pro Max was used for the same resolution images of the other denominations. Our data collection team followed clear guidelines to ensure various image captures.

Each denomination class included 720 images. Specifically, the 20 baht note images were captured in various orientations and settings, such as front views with dark, white, and cluttered backgrounds and front views rotated 180 degrees with the same background variations. The same approach was applied to the 50, 100, 500, and 1000 baht notes. Additionally, images of the back of each note, both normal and rotated 180 degrees, and half-folded top and bottom states, were captured under the same diverse background conditions.

The collected images were meticulously examined during data preparation to address inconsistencies in labeling and variations. Images were organized into folders according to the denominations of the new Thai currency. Given that most images were originally captured with an iPhone in HEIC format, they were converted to JPEG using the 'pyheif' Python module.

The data was divided into training and validation subsets with a 70%:30% ratio. There are a total of 2520 images for training and 1080 for validation.

Here's a brief overview of each objective, research question, and type of analysis you can try to perform:

Objectives:

Currency Note Recognition: Develop models to classify Thai currency notes accurately.

Image Preprocessing Techniques: Investigate the impact of preprocessing on model performance.

Robustness to Environmental Variations: Assess model robustness under varying conditions.

Real-Time Detection: Explore real-time recognition on mobile devices.

Comparative Analysis of Models: Compare performance of different deep learning architectures.

Data Augmentation Strategies: Evaluate the effectiveness of data augmentation for better generalization.

Research Questions:

Accuracy and Precision: How accurate are different models in classifying Thai currency notes?

Impact of Lighting Conditions: How does lighting affect recognition accuracy?

Orientation Sensitivity: How does orientation affect model predictions?

Background Influence: How do different backgrounds impact classification accuracy?

Folded Notes Recognition: How well does the model recognize folded notes?

Generalization to Unseen Data: How well do models generalize to new images?

Potential Insights:

Optimal Conditions for Recognition: Identify best conditions for highest accuracy.

Model Robustness: Determine most robust models.

Real-Time Application Viability: Assess feasibility for real-time applications.

Best Practices for Data Collection: Provide recommendations for future datasets.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt

Data from: Duck Hunt

This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt"

Explore at:

zip(7379197 bytes)Available download formats

Dataset updated

Jul 26, 2025

Authors

Hugo Zanini

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Duck Hunt Object Detection Dataset

Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

🎯 Dataset Statistics

Metric	Value
Total Images	1,004
Dataset Size	12 MB
Image Format	PNG
Annotation Format	YOLO (.txt)
Classes	4
Train/Val Split	711/260 (73%/27%)

Class Distribution

Class ID	Class Name	Count	Description
0	`dog`	252	The hunting dog in various poses (jumping, laughing, sniffing, etc.)
1	`duck_dead`	256	Dead ducks (both black and red variants)
2	`duck_shot`	248	Ducks in the moment of being shot
3	`duck_flying`	248	Flying ducks in all directions (left, right, diagonal)

📁 Dataset Structure

yolo_dataset_augmented/
├── images/
│  ├── train/      # 711 training images
│  └── val/       # 260 validation images
├── labels/
│  ├── train/      # 711 YOLO annotation files
│  └── val/       # 260 YOLO annotation files
├── classes.txt     # Class names mapping
├── dataset.yaml     # YOLO configuration file
└── augmented_dataset_stats.json # Detailed statistics

🔧 Data Augmentation Details

The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

Augmentation Techniques Applied:

Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
Quality Variations: Gaussian noise, slight blur for robustness
Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

Augmentation Parameters:

{
  'rotation_range': (-15, 15),    # Small rotations for game sprites
  'brightness_range': (0.7, 1.3),  # Brightness variations
  'contrast_range': (0.8, 1.2),   # Contrast adjustments
  'saturation_range': (0.8, 1.2),  # Color saturation
  'noise_intensity': 0.02,      # Gaussian noise
  'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
  'scaling_range': (0.8, 1.2),    # Scale variations
}

🚀 Usage Examples

Loading with YOLOv8 (Ultralytics)

from ultralytics import YOLO

# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)

# Validate
metrics = model.val()

# Predict
results = model('path/to/test/image.png')

Loading with PyTorch

import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os

class DuckHuntDataset(Dataset):
  def _init_(self, images_dir, labels_dir, transform=None):
    self.images_dir = images_dir
    self.labels_dir = labels_dir
    self.transform = transform
    self.images = os.listdir(images_dir)
  
  def _len_(self):
    return len(self.images)
  
  def _getitem_(self, idx):
    img_path = os.path.join(self.images_dir, self.images[idx])
    label_path = os.path.join(self.labels_dir, 
                 self.images[idx].replace('.png', '.txt'))
    
    image = Image.open(img_path)
    # Load YOLO annotations
    with open(label_path, 'r') as f:
      labels = f.readlines()
    
    if self.transform:
      image = self.transform(image)
      
    return image, labels

# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

YOLO Annotation Format

Each .txt file contains one line per object: class_id center_x center_y width height

Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

📊 Technical Specifications

Image Dimensions: Variable (original sprite sizes preserved)
Color Channels: RGB (3 channels)
Annotation Precision: Float32 (normalized coordinates)
File Naming: Descriptive names indicating class and augmentation type
Quality: High-resolution pixel art sprites

🎮 Dataset Context

This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

The Dog: Your hunting companion who retrieves ducks and ...

Clear search

Close search

Google apps

Main menu

Data from: Duck Hunt

Duck Hunt Object Detection Dataset

🎯 Dataset Statistics

Class Distribution

📁 Dataset Structure

🔧 Data Augmentation Details

Augmentation Techniques Applied:

Augmentation Parameters:

🚀 Usage Examples

Loading with YOLOv8 (Ultralytics)

Loading with PyTorch

YOLO Annotation Format

📊 Technical Specifications

🎮 Dataset Context

Data from: SalmonScan: A Novel Image Dataset for Machine Learning and Deep...

Waste Classfication Dataset

Balanced Waste Classification Dataset - E-Waste & Mixed Materials

🎯 Dataset Overview

📊 Dataset Statistics

🗂️ Waste Categories

🛠️ Data Processing Pipeline

1. Data Balancing

2. Data Augmentation Techniques

3. Quality Assurance

🎯 Recommended Use Cases

Primary Applications

Special Features

Model Architectures

📁 Dataset Structure

🚀 Getting Started

Step 1: Data Splitting

Step 2: Data Loading & Preprocessing

Data from: Bayesian Changepoint Detection via Logistic Regression and the...

BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks...

Sign Language Dataset - 5 Essential Phrases

Sign Language Recognition Dataset - 5 Essential Phrases

🎯 Overview

📊 Dataset Statistics

🏷️ Classes

📂 Dataset Structure

🎨 Data Collection & Preprocessing

Collection Process:

Preprocessing:

🔧 Image Characteristics

💡 Use Cases

🚀 Quick Start

Load Data with TensorFlow:

Load Data with PyTorch:

📈 Baseline Performance

🎓 Recommended Augmentation

⚠️ Limitations

🔮 Future Improvements

📜 Citation

📄 License

🤝 Acknowledgments

BIRD: Big Impulse Response Dataset

Data from: BioEncoder: a metric learning toolkit for comparative organismal...

Cars and Bikes Images

Breast Cancer Dataset

Dataset structure:

Data from: A convolutional neural network to identify mosquito species...

NASICON-type solid electrolyte materials named entity recognition dataset

Kidney Cancer Dataset

Dataset structure:

Citations:

Dataset Details: [1]

The original dataset structure [1]:

Data augmentation

Neural Networks in Friction Factor Analysis of Smooth Pipe Bends

Data supporting "Evaluating the Usability of Microgestures for Text Editing...

DIGITAL SINDHI SCRIPT

Landmarks Dataset

Tasks:

Code and Model Submission:

Virtual Influencer Cultural Content Dataset

SpaceNet: A Comprehensive Astronomical Dataset

Description:

Dataset Structure

Dataset Composition

Usage