25 datasets found
  1. Data from: Duck Hunt

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt
    Explore at:
    zip(7379197 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    Hugo Zanini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Duck Hunt Object Detection Dataset

    This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

    Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

    šŸŽÆ Dataset Statistics

    MetricValue
    Total Images1,004
    Dataset Size12 MB
    Image FormatPNG
    Annotation FormatYOLO (.txt)
    Classes4
    Train/Val Split711/260 (73%/27%)

    Class Distribution

    Class IDClass NameCountDescription
    0dog252The hunting dog in various poses (jumping, laughing, sniffing, etc.)
    1duck_dead256Dead ducks (both black and red variants)
    2duck_shot248Ducks in the moment of being shot
    3duck_flying248Flying ducks in all directions (left, right, diagonal)

    šŸ“ Dataset Structure

    yolo_dataset_augmented/
    ā”œā”€ā”€ images/
    │  ā”œā”€ā”€ train/      # 711 training images
    │  └── val/       # 260 validation images
    ā”œā”€ā”€ labels/
    │  ā”œā”€ā”€ train/      # 711 YOLO annotation files
    │  └── val/       # 260 YOLO annotation files
    ā”œā”€ā”€ classes.txt     # Class names mapping
    ā”œā”€ā”€ dataset.yaml     # YOLO configuration file
    └── augmented_dataset_stats.json # Detailed statistics
    

    šŸ”§ Data Augmentation Details

    The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

    Augmentation Techniques Applied:

    • Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
    • Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
    • Quality Variations: Gaussian noise, slight blur for robustness
    • Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

    Augmentation Parameters:

    {
      'rotation_range': (-15, 15),    # Small rotations for game sprites
      'brightness_range': (0.7, 1.3),  # Brightness variations
      'contrast_range': (0.8, 1.2),   # Contrast adjustments
      'saturation_range': (0.8, 1.2),  # Color saturation
      'noise_intensity': 0.02,      # Gaussian noise
      'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
      'scaling_range': (0.8, 1.2),    # Scale variations
    }
    

    šŸš€ Usage Examples

    Loading with YOLOv8 (Ultralytics)

    from ultralytics import YOLO
    
    # Load and train
    model = YOLO('yolov8n.pt') # Load pretrained model
    results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
    
    # Validate
    metrics = model.val()
    
    # Predict
    results = model('path/to/test/image.png')
    

    Loading with PyTorch

    import torch
    from torch.utils.data import Dataset, DataLoader
    from PIL import Image
    import os
    
    class DuckHuntDataset(Dataset):
      def _init_(self, images_dir, labels_dir, transform=None):
        self.images_dir = images_dir
        self.labels_dir = labels_dir
        self.transform = transform
        self.images = os.listdir(images_dir)
      
      def _len_(self):
        return len(self.images)
      
      def _getitem_(self, idx):
        img_path = os.path.join(self.images_dir, self.images[idx])
        label_path = os.path.join(self.labels_dir, 
                     self.images[idx].replace('.png', '.txt'))
        
        image = Image.open(img_path)
        # Load YOLO annotations
        with open(label_path, 'r') as f:
          labels = f.readlines()
        
        if self.transform:
          image = self.transform(image)
          
        return image, labels
    
    # Usage
    dataset = DuckHuntDataset('images/train', 'labels/train')
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    

    YOLO Annotation Format

    Each .txt file contains one line per object: class_id center_x center_y width height

    Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

    šŸ“Š Technical Specifications

    • Image Dimensions: Variable (original sprite sizes preserved)
    • Color Channels: RGB (3 channels)
    • Annotation Precision: Float32 (normalized coordinates)
    • File Naming: Descriptive names indicating class and augmentation type
    • Quality: High-resolution pixel art sprites

    šŸŽ® Dataset Context

    This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

    • The Dog: Your hunting companion who retrieves ducks and ...
  2. m

    Data from: SalmonScan: A Novel Image Dataset for Machine Learning and Deep...

    • data.mendeley.com
    Updated Mar 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Shoaib Ahmed (2024). SalmonScan: A Novel Image Dataset for Machine Learning and Deep Learning Analysis in Fish Disease Detection in Aquaculture [Dataset]. http://doi.org/10.17632/x3fz2nfm4w.2
    Explore at:
    Dataset updated
    Mar 18, 2024
    Authors
    Md Shoaib Ahmed
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The SalmonScan dataset is a collection of images of salmon fish, including healthy fish and infected fish. The dataset consists of two classes of images:

    Fresh salmon 🐟 Infected Salmon 🐠

    This dataset is ideal for various computer vision tasks in machine learning and deep learning applications. Whether you are a researcher, developer, or student, the SalmonScan dataset offers a rich and diverse data source to support your projects and experiments.

    So, dive in and explore the fascinating world of salmon health and disease!

    The SalmonScan dataset consists of approximately 1,208 images of salmon fish, classified into two classes:

    • Fresh salmon (healthy fish with no visible signs of disease), 456 images
    • Infected Salmon containing disease, 752 images

    Each class contains a representative and diverse collection of images, capturing a range of different perspectives, scales, and lighting conditions. The images have been carefully curated to ensure that they are of high quality and suitable for use in a variety of computer vision tasks.

    Data Preprocessing

    The input images were preprocessed to enhance their quality and suitability for further analysis. The following steps were taken:

    Resizing šŸ“: All the images were resized to a uniform size of 600 pixels in width and 250 pixels in height to ensure compatibility with the learning algorithm.

    Image Augmentation šŸ“ø: To overcome the small amount of images, various image augmentation techniques were applied to the input images. These included:

    Horizontal Flip ā†©ļø: The images were horizontally flipped to create additional samples. Vertical Flip ā¬†ļø: The images were vertically flipped to create additional samples. Rotation šŸ”„: The images were rotated to create additional samples. Cropping šŸŖ“: A portion of the image was randomly cropped to create additional samples. Gaussian Noise 🌌: Gaussian noise was added to the images to create additional samples. Shearing šŸŒ†: The images were sheared to create additional samples. Contrast Adjustment (Gamma) āš–ļø: The gamma correction was applied to the images to adjust their contrast. Contrast Adjustment (Sigmoid) āš–ļø: The sigmoid function was applied to the images to adjust their contrast.

    Usage

    To use the salmon scan dataset in your ML and DL projects, follow these steps:

    • Clone or download the salmon scan dataset repository from GitHub.
    • Unzip the file to access the two folders (FreshFish and InfectedFish).
    • Load the images into your preferred programming environment, such as Python.
    • Use standard libraries such as numpy or pandas to convert the images into arrays, which can be input into a machine learning or deep learning model.
    • Split the dataset into training, validation, and test sets as per your requirement.
    • Preprocess the data as needed, such as resizing and normalizing the images.
    • Train your ML/DL model using the preprocessed training data.
    • Evaluate the model on the test set and make predictions on new, unseen data.
  3. Waste Classfication Dataset

    • kaggle.com
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaan Ƈerkez (2025). Waste Classfication Dataset [Dataset]. https://www.kaggle.com/datasets/kaanerkez/waste-classfication-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kaan Ƈerkez
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    Balanced Waste Classification Dataset - E-Waste & Mixed Materials

    šŸŽÆ Dataset Overview

    This dataset contains a comprehensive collection of waste images designed for training machine learning models to classify different types of waste materials, with a strong focus on electronic waste (e-waste) and mixed materials. The dataset includes 7 electronic device categories alongside traditional recyclable materials, making it ideal for modern waste management challenges where electronic devices constitute a significant portion of waste streams. The dataset has been carefully curated and balanced to ensure optimal performance for multi-category waste classification tasks using deep learning approaches.

    šŸ“Š Dataset Statistics

    • Total Classes: 17 different waste categories
    • Images per Class: 400 (balanced)
    • Total Images: 6,800
    • Image Format: RGB (3 channels)
    • Recommended Input Size: 224Ɨ224 pixels
    • Data Structure: Single balanced dataset (not pre-split)

    šŸ—‚ļø Waste Categories

    The dataset includes 17 distinct waste categories covering various types of materials commonly found in waste management scenarios:

    1. Battery - Various types of batteries
    2. Cardboard - Cardboard packaging and boxes
    3. Glass - Glass containers and bottles
    4. Keyboard - Computer keyboards and input devices
    5. Metal - Metal cans and metallic waste
    6. Microwave - Microwave ovens and similar appliances
    7. Mobile - Mobile phones and smartphones
    8. Mouse - Computer mice and peripherals
    9. Organic - Biodegradable organic waste
    10. Paper - Paper products and documents
    11. PCB - Printed Circuit Boards (electronic components)
    12. Plastic - Plastic containers and packaging
    13. Player - Media players and entertainment devices
    14. Printer - Printers and printing equipment
    15. Television - TV sets and display devices
    16. Trash - General mixed waste
    17. Washing Machine - Washing machines and large appliances

    šŸ› ļø Data Processing Pipeline

    1. Data Balancing

    • Undersampling: Applied to classes with >400 images
    • Data Augmentation: Applied to classes with <400 images
    • Target: Exactly 400 images per class for balanced training

    2. Data Augmentation Techniques

    • Rotation: ±20 degrees
    • Width/Height Shift: ±20%
    • Shear Range: 20%
    • Zoom Range: 20%
    • Horizontal Flip: Enabled
    • Fill Mode: Nearest neighbor

    3. Quality Assurance

    • Consistent image dimensions
    • Proper file format validation
    • Balanced class distribution
    • Clean data structure

    šŸŽÆ Recommended Use Cases

    Primary Applications

    • E-Waste Classification: Specialized in electronic devices (Mobile, Keyboard, Mouse, PCB, etc.)
    • Mixed Waste Sorting: Traditional recyclables (Paper, Plastic, Glass, Metal, Cardboard)
    • Smart Recycling Systems: Automated waste sorting for both organic and electronic materials
    • Environmental Monitoring: Multi-category waste identification
    • Appliance Recycling: Large appliance classification (Microwave, TV, Washing Machine)

    Special Features

    • Electronic Waste Focus: Strong representation of e-waste categories (7 out of 17 classes)
    • Diverse Material Types: From organic waste to complex electronic devices
    • Real-world Categories: Practical classification for actual waste management scenarios
    • Appliance Recognition: Specialized in identifying large household appliances

    Model Architectures

    • Convolutional Neural Networks (CNN)
    • Transfer Learning with MobileNetV2, ResNet, EfficientNet
    • Vision Transformers (ViT)
    • Custom architectures for waste classification

    šŸ“ Dataset Structure

    balanced_waste_images/
    ā”œā”€ā”€ category_1/
    │  ā”œā”€ā”€ image_001.jpg
    │  ā”œā”€ā”€ image_002.jpg
    │  └── ... (400 images)
    ā”œā”€ā”€ category_2/
    │  ā”œā”€ā”€ image_001.jpg
    │  └── ... (400 images)
    └── ... (17 categories total)
    

    Note: Dataset is not pre-split. Users need to create train/validation/test splits as needed.

    šŸš€ Getting Started

    Step 1: Data Splitting

    Since the dataset is not pre-split, you'll need to create train/validation/test splits:

    import splitfolders
    
    # Split dataset: 80% train, 10% val, 10% test
    splitfolders.ratio(
      input='balanced_waste_images', 
      output='split_data',
      seed=42, 
      ratio=(.8, .1, .1),
      group_prefix=None,
      move=False
    )
    

    Step 2: Data Loading & Preprocessing

    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    # Data generators with preprocessing
    train_datagen = ImageDataGenerator(rescale=1./255)
    val_datagen = ImageDataGenerator(rescale=1./255)
    
    train_generator = train_datagen.flow_from_directory(
      'split_data/train/',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    
    val_generator = val_datagen.flow_from_director...
    
  4. f

    Data from: Bayesian Changepoint Detection via Logistic Regression and the...

    • tandf.figshare.com
    • iro.uiowa.edu
    application/csv
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew M. Thomas; Michael Jauch; David S. Matteson (2025). Bayesian Changepoint Detection via Logistic Regression and the Topological Analysis of Image Series [Dataset]. http://doi.org/10.6084/m9.figshare.29257379.v2
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Jul 17, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Andrew M. Thomas; Michael Jauch; David S. Matteson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a Bayesian method for multivariate changepoint detection that allows for simultaneous inference on the location of a changepoint and the coefficients of a logistic regression model for distinguishing pre-changepoint data from post-changepoint data. In contrast to many methods for multivariate changepoint detection, the proposed method is applicable to data of mixed type and avoids strict assumptions regarding the distribution of the data and the nature of the change. The regression coefficients provide an interpretable description of a potentially complex change. For posterior inference, the model admits a simple Gibbs sampling algorithm based on Pólya-gamma data augmentation. We establish conditions under which the proposed method is guaranteed to recover the true underlying changepoint. As a testing ground for our method, we consider the problem of detecting topological changes in time series of images. We demonstrate that our proposed method BCLR, combined with a topological feature embedding, performs well on both simulated and real image data. The method also successfully recovers the location and nature of changes in more traditional changepoint tasks. An implementation of our method is available in the Python package bclr.

  5. m

    BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks...

    • data.mendeley.com
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nahid Khan (2025). BSL-Static-48: A Dataset of Anonymized Images and MediaPipe Hand Landmarks for BSL Recognition [Dataset]. http://doi.org/10.17632/ms5phkw8sr.1
    Explore at:
    Dataset updated
    Oct 30, 2025
    Authors
    Nahid Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides a collection of images and extracted landmark features for 48 fundamental static signs in Bangla Sign Language (BSL), including 38 alphabets and 10 digits (0-9). It was created to support research in isolated sign language recognition (SLR) for BSL and provide a benchmark resource for the research community. In total, the dataset comprises 14,566 raw images, 14,566 mirrored images, and 29,132 processed feature samples.

    Data Contents: The dataset is organized into two main folders: 01_Images: Contains 29,132 images in .jpg format (14,566 raw + 14,566 mirrored). • Raw_Images: Contains 14,566 original images collected from participants. • Mirrored_Images: Contains 14,566 horizontally flipped versions of the raw images for data augmentation purposes. • Privacy Note: Facial regions in all images within this folder have been anonymized (blurred) to protect participant privacy, as formal
    informed consent for sharing identifiable images was not obtained prior to collection.

    02_Processed_Features_NPY: Contains 29,132 126-dimensional hand landmark features saved as NumPy arrays in .npy format. Features were extracted using MediaPipe Holistic (capturing 21 landmarks each for the left and right hands, resulting in 63 + 63 = 126 features per image). These feature files are pre-split into train (23,293 samples), val (2,911 samples), and test (2,928 samples) subdirectories (approximately 80%/10%/10%) for standardized model evaluation and benchmarking .

    Data Collection: Images were collected from 5 volunteers using a Macbook Air M3 camera. Data collection took place indoors under room lighting conditions against a white background. Images were captured manually using a Python script to ensure clarity.

    Potential Use: Researchers can utilize the anonymized raw and mirrored images (01_Images) to develop or test novel feature extraction techniques or multimodal recognition systems. Alternatively, the pre-processed and split .npy feature files (02_Processed_Features_NPY) can be directly used to efficiently train and evaluate machine learning models for static BSL recognition, facilitating reproducible research and benchmarking.

    Further Details: Please refer to the README.md file included within the dataset for detailed class mapping (e.g., L1='অ', D0='০'), comprehensive file statistics per class , specifics on the data processing pipeline, and citation guidelines.

  6. Sign Language Dataset - 5 Essential Phrases

    • kaggle.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Hamdey (2025). Sign Language Dataset - 5 Essential Phrases [Dataset]. https://www.kaggle.com/datasets/mohamedhamdey/5-basic-signes
    Explore at:
    zip(22115208 bytes)Available download formats
    Dataset updated
    Oct 25, 2025
    Authors
    Mohamed Hamdey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Sign Language Recognition Dataset - 5 Essential Phrases

    šŸŽÆ Overview

    This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.

    šŸ“Š Dataset Statistics

    • Total Images: ~1,000 images
    • Number of Classes: 5
    • Image Format: JPG
    • Image Size: 224Ɨ224 pixels (standardized)
    • Split: 75% Train / 15% Validation / 10% Test

    šŸ·ļø Classes

    Class IDMeaningDescription
    0YesAffirmative gesture
    1NoNegative gesture
    2I Love YouExpression of affection
    3HelloGreeting gesture
    4Thank YouGratitude expression

    šŸ“‚ Dataset Structure

    data_final/
    ā”œā”€ā”€ train/
    │  ā”œā”€ā”€ 0/  # Yes (~150 images)
    │  ā”œā”€ā”€ 1/  # No (~150 images)
    │  ā”œā”€ā”€ 2/  # I Love You (~150 images)
    │  ā”œā”€ā”€ 3/  # Hello (~150 images)
    │  └── 4/  # Thank You (~150 images)
    ā”œā”€ā”€ val/
    │  ā”œā”€ā”€ 0/
    │  ā”œā”€ā”€ 1/
    │  ā”œā”€ā”€ 2/
    │  ā”œā”€ā”€ 3/
    │  └── 4/
    └── test/
      ā”œā”€ā”€ 0/
      ā”œā”€ā”€ 1/
      ā”œā”€ā”€ 2/
      ā”œā”€ā”€ 3/
      └── 4/
    

    šŸŽØ Data Collection & Preprocessing

    Collection Process:

    • Images collected using webcam in controlled environment
    • Hand gestures detected using MediaPipe hand tracking
    • Multiple angles, positions, and lighting conditions
    • Various hand positions and distances from camera

    Preprocessing:

    • Hand region detection using MediaPipe
    • Automatic cropping to hand bounding box
    • Resized to 224Ɨ224 pixels
    • Padding added around hand region
    • Quality control and manual cleaning performed

    šŸ”§ Image Characteristics

    • Resolution: 224Ɨ224 pixels
    • Color: RGB
    • Background: Various (natural backgrounds)
    • Lighting: Mixed (natural and artificial)
    • Hand Orientation: Multiple angles
    • Distance: Varied (close, medium, far)

    šŸ’” Use Cases

    This dataset is suitable for:

    1. Sign Language Recognition Models

      • Real-time gesture recognition
      • Sign-to-speech applications
      • Accessibility tools
    2. Computer Vision Research

      • Hand gesture classification
      • Transfer learning experiments
      • Mobile ML applications
    3. Educational Projects

      • Learning deep learning basics
      • Building gesture recognition systems
      • Prototyping accessibility solutions

    šŸš€ Quick Start

    Load Data with TensorFlow:

    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    datagen = ImageDataGenerator(rescale=1./255)
    
    train_gen = datagen.flow_from_directory(
      'data_final/train',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    
    val_gen = datagen.flow_from_directory(
      'data_final/val',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    

    Load Data with PyTorch:

    from torchvision import datasets, transforms
    
    transform = transforms.Compose([
      transforms.Resize((224, 224)),
      transforms.ToTensor(),
      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    train_dataset = datasets.ImageFolder('data_final/train', transform=transform)
    val_dataset = datasets.ImageFolder('data_final/val', transform=transform)
    

    šŸ“ˆ Baseline Performance

    Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB

    šŸŽ“ Recommended Augmentation

    For better generalization, use these augmentation techniques: python train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=25, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, zoom_range=0.2, horizontal_flip=True, brightness_range=[0.7, 1.3] )

    āš ļø Limitations

    • Limited vocabulary: Only 5 signs (not comprehensive)
    • Single person: Images from one individual (limited diversity)
    • Static gestures: No motion-based signs
    • Controlled environment: May need adaptation for real-world scenarios
    • Hand dominance: Mix of left and right hands

    šŸ”® Future Improvements

    • Expand to 20+ common signs
    • Include multiple signers (diverse skin tones, ages, genders)
    • Add motion-based gestures (video data)
    • Regional sign language variations
    • More challenging backgrounds

    šŸ“œ Citation

    If you use this dataset in your research or project, please cite: @dataset{sign_language_5phrases_2025, title={Sign Language Recognition Dataset - 5 Essential Phrases}, author={[Your Name]}, year={2025}, publisher={Kaggle}, url={[Dataset URL]} }

    šŸ“„ License

    This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)

    šŸ¤ Acknowledgments

    • MediaPipe by Google for hand tracking
    • TensorFlow/Keras for deep learning fr...
  7. Z

    BIRD: Big Impulse Response Dataset

    • data.niaid.nih.gov
    • kaggle.com
    Updated Oct 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grondin, FranƧois; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, FranƧois (2020). BIRD: Big Impulse Response Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4139415
    Explore at:
    Dataset updated
    Oct 29, 2020
    Dataset provided by
    Mila - UniversitƩ de MontrƩal
    UniversitƩ de Sherbrooke
    Authors
    Grondin, FranƧois; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, FranƧois
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.

  8. Data from: BioEncoder: a metric learning toolkit for comparative organismal...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moritz David Lürig; Moritz David Lürig; Emanuela Di Martino; Emanuela Di Martino; Arthur Porto; Arthur Porto (2024). Data from: BioEncoder: a metric learning toolkit for comparative organismal biology [Dataset]. http://doi.org/10.5281/zenodo.13017212
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Moritz David Lürig; Moritz David Lürig; Emanuela Di Martino; Emanuela Di Martino; Arthur Porto; Arthur Porto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Mar 28, 2024
    Description

    BioEncoder: a metric learning toolkit for comparative organismal biology

    Abstract - In the realm of biological image analysis, deep learning (DL) has become a core toolkit, e.g., for segmentation and classification. However, conventional DL methods are challenged by large biodiversity datasets characterized by unbalanced classes and hard-to-distinguish phenotypic differences between them. Here we present BioEncoder, a user-friendly toolkit for metric learning, which overcomes these challenges by focussing on learning relationships between individual data points rather than on the separability of classes. BioEncoder is released as a Python package, created for ease of use and flexibility across diverse datasets. It features taxon-agnostic data loaders, custom augmentation options, and simple hyperparameter adjustments through text-based configuration files. The toolkit's significance lies in its potential to unlock new research avenues in biological image analysis while democratizing access to advanced deep metric learning techniques. BioEncoder focuses on the urgent need for toolkits bridging the gap between complex DL pipelines and practical applications in biological research.

    Dataset - This data repository includes two things: a snapshot of the BioEncoder package (BioEncoder-main.zip, version 1.0.0, downloaded from https://github.com/agporto/BioEncoder on 2024-07-19 at 17:20), and the damselfly dataset used for the case study presented in the paper (bioencoder_data.zip). The dataset archive also encompasses the configuration files and the final model checkpoints from the case study, as well as a script to reproduce the results and figures presented in the paper.

    How to use - Get started by consulting the GithHub repository for information on how to install BioEncoder, then download the data archive and run the script. Some parts of the script can be executed using the model checkpoints, for orther parts the training rountine needs to be run.

  9. Cars and Bikes Images

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suresh Maheshwari (2025). Cars and Bikes Images [Dataset]. https://www.kaggle.com/datasets/sureshmaheshwari021/cars-and-bikes-images
    Explore at:
    zip(48592401 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    Suresh Maheshwari
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains a collection of car and bike images scraped from the web using the Bing Image Crawler (icrawler library in Python). It was created for educational and research purposes, especially for projects involving computer vision, deep learning, and image classification.

    Each image was retrieved from publicly available Bing search results and organized into two folders:

    cars/ — contains images of different types and models of cars

    bikes/ — contains images of various motorcycles and scooters

    Usage

    This dataset is suitable for:

    Training and testing CNNs or transfer learning models (e.g., ResNet, VGG, EfficientNet)

    Practicing image preprocessing and augmentation techniques

    Developing vehicle recognition or classification systems

    Data Collection

    Images were automatically collected using:

    from icrawler.builtin import BingImageCrawler

    with filters={'type': 'photo'} to ensure only photographic content.

    License

    All images are shared under the CC0: Public Domain license. They are intended solely for non-commercial, academic, and research use.

  10. Breast Cancer Dataset

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Djaid Walid (2025). Breast Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/djaidwalid/breast-cancer-dataset
    Explore at:
    zip(2527354850 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    Djaid Walid
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset structure:

    This dataset is divided into two main directories train and test directory each divided into two other directories breast_malignant and breast_benign following this structure: ``` \train | |_\breast_malignant | |_4000 images | |_\breast_benign | |_4000 images

    \test | |_\breast_malignant | |_1000 images | |_\breast_benign |
    |_1000 images

    # Dataset details:
    
    |Path| Subclass|Description|
    |-----|-----------|--------------|
    |breast_benign| Benign| Non-cancerous breast tissues|
    |breast_malignant| Malignant| TCancerous breast tissues|
    
    *Source: Collected from the Breast Cancer dataset by Anas Elmasry on Kaggle.*
    
    # Data augmentation:
    the data was augmentend by the original author of the dataset using Kera's `ImageDataGeberator` *[1]* and The augmentations include:
    - Rotation: Up to 10 degrees.
    -  Width & Height Shift: Up to 10% of the total image size.
    -  Shearing & Zooming: 10% variation.
    -  Horizontal Flip: Randomly flips images for additional diversity.
    -  Brightness Adjustment: Ranges from 0.2 to 1.2 for varying light conditions.
    
    The parameters used for augmentation:
    ```python
    from keras.preprocessing.image import ImageDataGenerator
    
    datagen = ImageDataGenerator(
      rotation_range=10,
      width_shift_range=0.1,
      height_shift_range=0.1,
      shear_range=0.1,
      zoom_range=0.1,
      horizontal_flip=True,
      fill_mode='nearest',
      brightness_range=[0.2, 1.2]
    )
    
    
  11. n

    Data from: A convolutional neural network to identify mosquito species...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Feb 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felix Sauer; Moritz Werny; Kristopher Nolte; Carmen Villacañas de Castro; Norbert Becker; Ellen Kiel; Renke Lühken (2024). A convolutional neural network to identify mosquito species (Diptera: Culicidae) of the genus Aedes by wing images [Dataset]. http://doi.org/10.5061/dryad.vx0k6djz9
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 18, 2024
    Dataset provided by
    Heidelberg University
    Carl von Ossietzky UniversitƤt Oldenburg
    Bernhard Nocht Institute for Tropical Medicine
    The Deep Bench
    Authors
    Felix Sauer; Moritz Werny; Kristopher Nolte; Carmen Villacañas de Castro; Norbert Becker; Ellen Kiel; Renke Lühken
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Accurate species identification is a prerequisite to assess the medical relevance of a mosquito specimens. In monitoring or surveillance programs, mosquitoes are typically identified based on morphological characters, which can be supported by molecular biological assays. Both methods require intensive experience of the observers and well-equipped laboratories. The use of convolutional neural networks (CNNs) to identify species based on images may be a cost-effective and reliable alternative. In this proof-of-concept study, we developed a CNN to identify seven Aedes species by wing images, only. While previous studies used images of the whole mosquito body, the nearly two-dimensional wings may facilitate standardized image capture and thereby reduce the complexity of the CNN implementation. Mosquitoes were sampled from different sites in Germany. Their wings were mounted and photographed with a professional stereomicroscope. The data set consisted of 1,155 wing images from seven Aedes species, including the exotic species Aedes albopictus und six native Aedes species, as well as 554 wings from different non-Aedes mosquitoes. The wing images were used to train a CNN to differentiate between Aedes and non-Aedes mosquitoes and to classify the seven Aedes species. The training was conducted separately for grayscale and RGB images. Image processing, data augmentation, training, validation and testing were conducted in python using deep-learning framework PyTorch. For both input images, i.e. grayscale and RGB images, our best-performing CNN configuration achieved an accuracy of 100% to discriminate Aedes from non-Aedes mosquito species. The accuracy to predict the Aedes species reached 93% for grayscale images and 96% for RGB images. Aedes albopictus could be identified with an accuracy of 100%. In conclusion, wing images are sufficient to identify mosquito species by CNN based image classification. Thus, wing images can represent a useful complement for CNN-based image classification, e.g. for damaged mosquito specimens. Larger training data sets with further mosquito species and a greater variety of images are required to improve and test broad applicability. Methods The study was based on 1,155 wing photos from female Aedes specimens, including 165 Ae. albopictus, 165 Ae. cinereus, 165 Ae. communis, 165 Ae. punctor, 165 Ae. rusticus, 165 Ae. sticticus and 165 Ae. vexans. As unknown-class we integrated further 554 wing photos from common non-Aedes mosquito species in Germany, including 61 Anopheles claviger (Meigen, 1804), 196 Anopheles maculipennis s.l., 11 Anopheles plumbeus Stephens, 1828, 214 Culex pipiens s.s./Cx. torrentium and 72 Coquillettidia richiardii (Ficalbi, 1889). The field-sampled mosquitoes were directly killed and stored at -20 °C until further preparation. All specimens were identified by morphology. After the morphological species identification, the right wing of each specimen was removed and mounted with euparal (Carl Roth, Karlsruhe, Germany) on microscopic slides. Subsequently, the mounted wings were photographed with a stereomicroscope (Leica M205 C, Leica Microsystems, Wetzlar, Germany) under 20Ɨ magnification using standardized illumination under and exposure time (279 ms).

  12. S

    NASICON-type solid electrolyte materials named entity recognition dataset

    • scidb.cn
    Updated Apr 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi (2023). NASICON-type solid electrolyte materials named entity recognition dataset [Dataset]. http://doi.org/10.57760/sciencedb.j00213.00001
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Liu Yue; Liu Dahui; Yang Zhengwei; Shi Siqi
    Description

    1.Framework overview. This paper proposed a pipeline to construct high-quality datasets for text mining in materials science. Firstly, we utilize the traceable automatic acquisition scheme of literature to ensure the traceability of textual data. Then, a data processing method driven by downstream tasks is performed to generate high-quality pre-annotated corpora conditioned on the characteristics of materials texts. On this basis, we define a general annotation scheme derived from materials science tetrahedron to complete high-quality annotation. Finally, a conditional data augmentation model incorporating materials domain knowledge (cDA-DK) is constructed to augment the data quantity.2.Dataset information. The experimental datasets used in this paper include: the Matscholar dataset publicly published by Weston et al. (DOI: 10.1021/acs.jcim.9b00470), and the NASICON entity recognition dataset constructed by ourselves. Herein, we mainly introduce the details of NASICON entity recognition dataset.2.1 Data collection and preprocessing. Firstly, 55 materials science literature related to NASICON system are collected through Crystallographic Information File (CIF), which contains a wealth of structure-activity relationship information. Note that materials science literature is mostly stored as portable document format (PDF), with content arranged in columns and mixed with tables, images, and formulas, which significantly compromises the readability of the text sequence. To tackle this issue, we employ the text parser PDFMiner (a Python toolkit) to standardize, segment, and parse the original documents, thereby converting PDF literature into plain text. In this process, the entire textual information of literature, encompassing title, author, abstract, keywords, institution, publisher, and publication year, is retained and stored as a unified TXT document. Subsequently, we apply rules based on Python regular expressions to remove redundant information, such as garbled characters and line breaks caused by figures, tables, and formulas. This results in a cleaner text corpus, enhancing its readability and enabling more efficient data analysis. Note that special symbols may also appear as garbled characters, but we refrain from directly deleting them, as they may contain valuable information such as chemical units. Therefore, we converted all such symbols to a special token

  13. Kidney Cancer Dataset

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Djaid Walid (2025). Kidney Cancer Dataset [Dataset]. https://www.kaggle.com/datasets/djaidwalid/kidney-cancer-dataset
    Explore at:
    zip(398877314 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    Djaid Walid
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Kidney Cancer Dataset is a subdataset of Multi Cancer Dataset

    Dataset structure:

    the data was split into two main directories trainand test following this structure:

    \train
     |
     |_\kidney_normal
       |
       |_4000 images
     |
     |_\kidney_tumor
       |
       |_4000 images
    
    \test
     |
     |_\kidney_normal
       |
       |_1000 images
     |
     |_\kidney_tumor
       | 
       |_1000 images
    

    Citations:

    You can cite this dataset as: https://www.kaggle.com/datasets/djaidwalid/kidney-cancer-dataset/data

    Dataset Details: [1]

    The original dataset structure [1]:

    Cancer TypeClassesImages
    Acute Lymphoblastic Leukemia420,000
    Brain Cancer315,000
    Breast Cancer210,000
    Cervical Cancer525,000
    Kidney Cancer210,000
    Lung and Colon Cancer525,000
    Lymphoma315,000
    Oral Cancer210,000

    I selected for this model only the Kidney Cancer directory.

    Data augmentation

    the data was augmentend by the original author of the dataset using Kera's ImageDataGeberator [1] and The augmentations include: - Rotation: Up to 10 degrees. - Width & Height Shift: Up to 10% of the total image size. - Shearing & Zooming: 10% variation. - Horizontal Flip: Randomly flips images for additional diversity. - Brightness Adjustment: Ranges from 0.2 to 1.2 for varying light conditions.

    The parameters used for augmentation: ```python from keras.preprocessing.image import ImageDataGenerator

    datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.1, zoom_range=0.1, horizontal_flip=True, fill_mode='nearest', brightness_range=[0.2, 1.2] )

    # ***References:***
    
    *[1]* Obuli Sai Naren. (2022). Multi Cancer Dataset [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/3415848
    
  14. m

    Neural Networks in Friction Factor Analysis of Smooth Pipe Bends

    • data.mendeley.com
    Updated Dec 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adarsh Vasa (2022). Neural Networks in Friction Factor Analysis of Smooth Pipe Bends [Dataset]. http://doi.org/10.17632/sjvbwh5ckg.1
    Explore at:
    Dataset updated
    Dec 19, 2022
    Authors
    Adarsh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROGRAM SUMMARY No. of lines in distributed program, including test data, etc.: 481 No. of bytes in distributed program, including test data, etc.: 14540.8 Distribution format: .py, .csv Programming language: Python Computer: Any workstation or laptop computer running TensorFlow, Google Colab, Anaconda, Jupyter, pandas, NumPy, Microsoft Azure and Alteryx. Operating system: Windows and Mac OS, Linux.

    Nature of problem: Navier-Stokes equations are solved numerically in ANSYS Fluent using Reynolds stress model for turbulence. The simulated values of friction factor are validated with theoretical and experimental data obtained from literature. Artificial neural networks are then used for a prediction-based augmentation of friction factor. The capabilities of the neural networks is discussed, in regard to computational cost and domain limitations.

    Solution method: The simulation data is obtained through Reynolds stress modelling of fluid flow through pipe. This data is augmented using the artificial neural network model that predicts within and without data domain.

    Restrictions: The code used in this research is limited to smooth pipe bends, in which friction factor is analysed using a steady state incompressible fluid flow.

    Runtime: The artificial neural network produces results within a span of 20 seconds for three-dimensional geometry, using the allocated free computational resources of Google Colaboratory cloud-based computing system.

  15. c

    Data supporting "Evaluating the Usability of Microgestures for Text Editing...

    • repository.cam.ac.uk
    zip
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Xiang; He, Wei; Kristensson, Per Ola (2025). Data supporting "Evaluating the Usability of Microgestures for Text Editing Tasks in Virtual Reality" [Dataset]. http://doi.org/10.17863/CAM.115757
    Explore at:
    zip(37833332 bytes)Available download formats
    Dataset updated
    Sep 22, 2025
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Li, Xiang; He, Wei; Kristensson, Per Ola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Origin of the Dataset

    This dataset originates from a research project investigating microgesture-based text editing in virtual reality (VR). The dataset was collected as part of an evaluation of the MicroGEXT system, which enables precise and efficient text editing using small, subtle hand movements. The research aims to explore lightweight, ergonomic alternatives to traditional mid-air gesture interactions.

    1. Data Collection Methods • Hardware: The dataset was collected using the Meta Quest Pro VR headset, utilizing its XR Hand Tracking package to capture hand skeleton data at 72 Hz. • Participants: 10 participants were recruited for gesture elicitation and evaluation. • Procedure:

      1. Participants interacted with a VR text-editing application that mapped microgestures to common editing functions.
      2. Before data collection, participants viewed a demonstration video to understand each gesture.
      3. Each participant performed each gesture 20 times to ensure data consistency.
      4. Static gestures were clipped to 2 seconds, while dynamic gestures were recorded in 5-second clips to capture complete motion sequences.
      5. Swipe gestures were segmented into sub-states (0–3) for granular phase analysis, with each frame assigned a sub-state label.
    2. Technical & Non-Technical Information for Reusability • The dataset is suitable for: • Gesture recognition research (static/dynamic gestures, sub-state segmentation). • Human-computer interaction (HCI) studies focusing on XR input methods. • Machine learning applications, including deep learning-based gesture classification. • Reuse Considerations: • Compatible with Unity’s XR Hand Tracking package and Python-based deep learning frameworks (e.g., PyTorch, TensorFlow). • Includes data augmentation scripts for expanding training datasets. • The Null class helps mitigate false activations in real-time applications.

  16. DIGITAL SINDHI SCRIPT

    • kaggle.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shayan Ali Shaikh (2025). DIGITAL SINDHI SCRIPT [Dataset]. https://www.kaggle.com/datasets/shayanalishaikh/digital-sindhi-script
    Explore at:
    zip(29387394 bytes)Available download formats
    Dataset updated
    Oct 25, 2025
    Authors
    Shayan Ali Shaikh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    About Dataset

    🧠 Overview The Sindhi Handwritten Alphabet Dataset is a comprehensive collection of handwritten Sindhi alphabet images, developed to support research in handwriting recognition, OCR, and computer vision for regional scripts. This dataset emphasizes diversity, authenticity, and real-world handwriting variations, making it highly suitable for AI-based Sindhi character recognition systems.

    🧾 Dataset Summary

    1. Total Sindhi Letters: 52
    2. Images per Letter: Approximately 460
    3. Total Images: Around 24,000
    4. Image Format: JPG
    5. Data Source: Handwritten samples collected from students in 45 schools across Sindh (Classes 3 to 7)
    6. Data Augmentation: Advanced augmentation techniques applied using OpenCV and other preprocessing tools
    7. Annotation Tools: RoboFlow used for labeling and data management

    Diversity & Realism The dataset captures handwriting from contributors across multiple generations and genders, reflecting a wide range of writing styles and characteristics. Generations Covered: Gen X Millennials Gen Z Gen Alpha

    Writing Styles: Cursive Bold Thin Uneven and natural strokes

    This diversity ensures models trained on this dataset can generalize well to unseen handwriting and different personal writing habits.

    šŸŽÆ Usage & Applications

    This dataset is ideal for:

    Optical Character Recognition (OCR) for Sindhi script Handwritten Character Classification Handwriting Style Analysis across generations AI-based Sindhi Language Digitization & Preservation Computer Vision Research for regional and low-resource languages

    Model Development A ResNet-50 based deep learning model was trained on this dataset with four additional layers and strong augmentation strategies.

    Model Performance: Training Accuracy: 97% Validation Accuracy: 98% Testing Accuracy: 92%

    These results demonstrate the dataset’s effectiveness for developing high-performing Sindhi alphabet recognition models.

    šŸ‘©ā€šŸ’» Development Team Shayan Ali Shaikh Muhammad Hamza Under the Supervision of: šŸŽ“ Dr. Attaullah Sahito

    Data Collection: Handwriting samples were collected from students in 45 schools across Sindh, covering Classes 3–7. This ensures authentic, naturally varied handwriting samples representing real-world conditions.

    🧩 Technical Notes

    Tools Used: OpenCV, RoboFlow, Python (for preprocessing and manipulation)

    Augmentation Techniques: Rotation, noise addition, brightness/contrast variation, and blurring for robustness

    šŸ“œ License & Attribution

    This dataset is released under the Creative Commons CC BY 4.0 License, allowing free use, sharing, and modification with proper attribution.

    Contributors: Shayan Ali Shaikh Muhammad Hamza

    šŸ’” Acknowledgment

    This dataset was developed to digitize and preserve the Sindhi language through AI. We encourage students, researchers, and developers to use this dataset to advance Sindhi handwriting recognition and OCR technologies. Note It is first dataset ever written by students from primary level to secondary level and written on real pages with pen. There is no any computerized mainpulation.

  17. Landmarks Dataset

    • kaggle.com
    zip
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kayvan Shah (2023). Landmarks Dataset [Dataset]. https://www.kaggle.com/datasets/kayvanshah/landmarks-dataset
    Explore at:
    zip(191065672 bytes)Available download formats
    Dataset updated
    Apr 23, 2023
    Authors
    Kayvan Shah
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The dataset consists of images of famous (or not-so-famous) landmarks. The collection is organized into a two-level hierarchy structure. The first level is the categories for the landmarks, and the second level is the individual landmarks. There are 6 categories, and categories are: 1. Gothic 2. Modern 3. Mughal 4. Neoclassical 5. Pagodas 6. Pyramids

    For each category, there are 5 landmarks, for a total of 30 landmarks. Each landmark has 14 images.

    Tasks:

    • This group project is comprised of two machine-learning tasks:
    • Category classification: predict the category names of images
    • Landmark classification: predict the landmark names of images

    The landmarks dataset is too small to train convolutional neural networks (CNNs) from scratch. The resulting network will overfit the data. Instead, use transfer learning by reusing part of a pre-trained CNN. In transfer learning, instead of training the neural network starting from random weights, the weights for the lower parts of the network are taken from a pre-trained network. Only the higher parts of the network will have to be learned. Chapter 14 of GƩron discusses how to apply pre-trained models for transfer learning.

    For this group project, the only allowed pre-trained networks are EfficientNetB0 and VGG16, which are smaller CNNs. The objective of this restriction is to avoid penalizing groups that do not have access to powerful machines and/or machines with GPUs. Groups are allowed to use Google Colab with GPUs to train the models, but be aware of resource usage limitations.

    Data augmentation is another way to overcome the problem of small datasets. Keras/TensorFlow provides various image manipulation functions (hitps://www.tensorflow.org/api_docs/python/tf/image) that can be used to generate additional images. Refer to Lecture 9 slides and Chapter 14 of GƩron.

    Yet another way to overcome the small dataset problem is experimenting with various ways of combining the models for the two tasks. It is possible to train two distinct models, one for category classification and one for landmark classification. But would landmark classification benefit from knowing the output of classification? Or vice versa?

    Code and Model Submission:

    • The details of the submission will be provided later. We are in the process of setting up a Vocareum site that will allow you to run your model against part of the holdout test images.
    • You are strongly encouraged to use Keras/TensorFlow.
  18. Virtual Influencer Cultural Content Dataset

    • kaggle.com
    zip
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). Virtual Influencer Cultural Content Dataset [Dataset]. https://www.kaggle.com/programmer3/virtual-influencer-cultural-content-dataset
    Explore at:
    zip(18804 bytes)Available download formats
    Dataset updated
    Jun 12, 2025
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 1,023 multimedia data created by a virtual influencer focused on cultural heritage, featuring a diverse range of content including images, videos, and texts representing historical landmarks, traditional attire, cultural festivals, and architectural symbols. The data was collected through content generation, real-world cultural representations, and preprocessed with techniques such as resizing, normalization, and data augmentation to ensure consistency and diversity. The dataset is extracted and flattened into a CSV file format. From the location in Shanghai, China.

  19. SpaceNet: A Comprehensive Astronomical Dataset

    • kaggle.com
    zip
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raza Imam (2024). SpaceNet: A Comprehensive Astronomical Dataset [Dataset]. https://www.kaggle.com/datasets/razaimam45/spacenet-an-optimally-distributed-astronomy-data
    Explore at:
    zip(56552989870 bytes)Available download formats
    Dataset updated
    Aug 30, 2024
    Authors
    Raza Imam
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description:

    SpaceNet, attained via a novel double-stage augmentation framework: FLARE https://arxiv.org/pdf/2405.13267, is a hierarchically structured and high-quality astronomical image dataset designed for fine-grained and macro classification tasks. Comprising approximately 12,900 samples, SpaceNet integrates lower (LR) to higher resolution (HR) conversion with standard augmentations and a diffusion approach for synthetic sample generation. This dataset enables superior generalization on various recogntion tasks like classification.

    Dataset Structure

    • Fine-Grained Classes: 8 classes including planets, galaxies, asteroids, nebulae, comets, black holes, stars, and constellations.

    Dataset Composition

    Total Samples: Approximately 12,900 images. Fine-Grained Class Distribution: - Asteroid: 283 files - Black Hole: 656 files - Comet: 416 files - Constellation: 1,552 files - Galaxy: 3,984 files - Nebula: 1,192 files - Planet: 1,472 files - Star: 3,269 files

    Usage

    SpaceNet is suitable for:

    • Training and evaluating machine learning models on fine-grained and macro astronomical classification tasks.
    • Research on hierarchical classification approaches in the astronomy domain.
    • Developing robust models that generalize well across in-domain and out-of-domain datasets.

    Citation

    If you use SpaceNet in your research, please cite it as follows: python @misc{alamimam2024flare, title={FLARE up your data: Diffusion-based Augmentation Method in Astronomical Imaging}, author={Mohammed Talha Alam and Raza Imam and Mohsen Guizani and Fakhri Karray}, year={2024}, eprint={2405.13267}, archivePrefix={arXiv}, primaryClass={cs.CV} }

  20. NEW THAI CURRENCY NOTE DATASET

    • kaggle.com
    zip
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irfan Ahmad (2024). NEW THAI CURRENCY NOTE DATASET [Dataset]. https://www.kaggle.com/datasets/irfanahmad1/new-thai-currency-note-dataset/data
    Explore at:
    zip(10749845451 bytes)Available download formats
    Dataset updated
    Aug 2, 2024
    Authors
    Irfan Ahmad
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A diverse dataset is crucial for training deep learning models, especially in the context of currency note recognition. Factors such as diverse backgrounds, lighting, orientation, and blur can significantly impact model outcomes. While high-quality scans of different currencies are accessible on collectors' websites, these often lack the variety seen in real-world scenarios. Additionally, publicly available datasets, which primarily feature old Thai currency notes, are limited, containing up to only 1000 images.

    Recognizing the scarcity of comprehensive datasets for new Thai currency notes, we curated a collection of 3,600 images spanning five denominations: - 20 baht - 50 baht - 100 baht - 500 baht - 1000 baht

    These images depict the notes in various orientations and settings, including different backgrounds and lighting conditions, such as illuminated and dark environments.

    We used two iPhone models to capture this diversity: - iPhone 13 Pro Max (12-megapixel, f/1.8 rear camera) - iPhone 12 (12-megapixel, f/1.6 rear camera)

    Unique scenarios were also included, such as half-folded notes against contrasting backgrounds. For consistency, the iPhone 12 captured 4032Ɨ3024 resolution shots of the 50 and 1000 baht notes, while the iPhone 13 Pro Max was used for the same resolution images of the other denominations. Our data collection team followed clear guidelines to ensure various image captures.

    Each denomination class included 720 images. Specifically, the 20 baht note images were captured in various orientations and settings, such as front views with dark, white, and cluttered backgrounds and front views rotated 180 degrees with the same background variations. The same approach was applied to the 50, 100, 500, and 1000 baht notes. Additionally, images of the back of each note, both normal and rotated 180 degrees, and half-folded top and bottom states, were captured under the same diverse background conditions.

    The collected images were meticulously examined during data preparation to address inconsistencies in labeling and variations. Images were organized into folders according to the denominations of the new Thai currency. Given that most images were originally captured with an iPhone in HEIC format, they were converted to JPEG using the 'pyheif' Python module.

    The data was divided into training and validation subsets with a 70%:30% ratio. There are a total of 2520 images for training and 1080 for validation.

    Here's a brief overview of each objective, research question, and type of analysis you can try to perform:

    Objectives:

    1. Currency Note Recognition: Develop models to classify Thai currency notes accurately.
    2. Image Preprocessing Techniques: Investigate the impact of preprocessing on model performance.
    3. Robustness to Environmental Variations: Assess model robustness under varying conditions.
    4. Real-Time Detection: Explore real-time recognition on mobile devices.
    5. Comparative Analysis of Models: Compare performance of different deep learning architectures.
    6. Data Augmentation Strategies: Evaluate the effectiveness of data augmentation for better generalization.

    Research Questions:

    1. Accuracy and Precision: How accurate are different models in classifying Thai currency notes?
    2. Impact of Lighting Conditions: How does lighting affect recognition accuracy?
    3. Orientation Sensitivity: How does orientation affect model predictions?
    4. Background Influence: How do different backgrounds impact classification accuracy?
    5. Folded Notes Recognition: How well does the model recognize folded notes?
    6. Generalization to Unseen Data: How well do models generalize to new images?

    Potential Insights:

    1. Optimal Conditions for Recognition: Identify best conditions for highest accuracy.
    2. Model Robustness: Determine most robust models.
    3. Real-Time Application Viability: Assess feasibility for real-time applications.
    4. Best Practices for Data Collection: Provide recommendations for future datasets.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt
Organization logo

Data from: Duck Hunt

This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt"

Related Article
Explore at:
zip(7379197 bytes)Available download formats
Dataset updated
Jul 26, 2025
Authors
Hugo Zanini
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Duck Hunt Object Detection Dataset

This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

šŸŽÆ Dataset Statistics

MetricValue
Total Images1,004
Dataset Size12 MB
Image FormatPNG
Annotation FormatYOLO (.txt)
Classes4
Train/Val Split711/260 (73%/27%)

Class Distribution

Class IDClass NameCountDescription
0dog252The hunting dog in various poses (jumping, laughing, sniffing, etc.)
1duck_dead256Dead ducks (both black and red variants)
2duck_shot248Ducks in the moment of being shot
3duck_flying248Flying ducks in all directions (left, right, diagonal)

šŸ“ Dataset Structure

yolo_dataset_augmented/
ā”œā”€ā”€ images/
│  ā”œā”€ā”€ train/      # 711 training images
│  └── val/       # 260 validation images
ā”œā”€ā”€ labels/
│  ā”œā”€ā”€ train/      # 711 YOLO annotation files
│  └── val/       # 260 YOLO annotation files
ā”œā”€ā”€ classes.txt     # Class names mapping
ā”œā”€ā”€ dataset.yaml     # YOLO configuration file
└── augmented_dataset_stats.json # Detailed statistics

šŸ”§ Data Augmentation Details

The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

Augmentation Techniques Applied:

  • Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
  • Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
  • Quality Variations: Gaussian noise, slight blur for robustness
  • Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

Augmentation Parameters:

{
  'rotation_range': (-15, 15),    # Small rotations for game sprites
  'brightness_range': (0.7, 1.3),  # Brightness variations
  'contrast_range': (0.8, 1.2),   # Contrast adjustments
  'saturation_range': (0.8, 1.2),  # Color saturation
  'noise_intensity': 0.02,      # Gaussian noise
  'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
  'scaling_range': (0.8, 1.2),    # Scale variations
}

šŸš€ Usage Examples

Loading with YOLOv8 (Ultralytics)

from ultralytics import YOLO

# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)

# Validate
metrics = model.val()

# Predict
results = model('path/to/test/image.png')

Loading with PyTorch

import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os

class DuckHuntDataset(Dataset):
  def _init_(self, images_dir, labels_dir, transform=None):
    self.images_dir = images_dir
    self.labels_dir = labels_dir
    self.transform = transform
    self.images = os.listdir(images_dir)
  
  def _len_(self):
    return len(self.images)
  
  def _getitem_(self, idx):
    img_path = os.path.join(self.images_dir, self.images[idx])
    label_path = os.path.join(self.labels_dir, 
                 self.images[idx].replace('.png', '.txt'))
    
    image = Image.open(img_path)
    # Load YOLO annotations
    with open(label_path, 'r') as f:
      labels = f.readlines()
    
    if self.transform:
      image = self.transform(image)
      
    return image, labels

# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

YOLO Annotation Format

Each .txt file contains one line per object: class_id center_x center_y width height

Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

šŸ“Š Technical Specifications

  • Image Dimensions: Variable (original sprite sizes preserved)
  • Color Channels: RGB (3 channels)
  • Annotation Precision: Float32 (normalized coordinates)
  • File Naming: Descriptive names indicating class and augmentation type
  • Quality: High-resolution pixel art sprites

šŸŽ® Dataset Context

This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

  • The Dog: Your hunting companion who retrieves ducks and ...
Search
Clear search
Close search
Google apps
Main menu