20 datasets found
  1. COVID Rearrange Dataset

    • kaggle.com
    zip
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DD.Zh (2024). COVID Rearrange Dataset [Dataset]. https://www.kaggle.com/datasets/dadadazhang/covid-data-rearrange/versions/2
    Explore at:
    zip(47423691852 bytes)Available download formats
    Dataset updated
    Nov 21, 2024
    Authors
    DD.Zh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The original dataset is from https://www.kaggle.com/datasets/andyczhao/covidx-cxr2

    The data is separated based on the .txt file (see link) into positive and negative.

    Data Augmentation Code

    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    datagen = ImageDataGenerator(
      rescale=1./255,        # Normalize
      rotation_range=20,      # Rotation reference
      zoom_range=0.2,        # Zoom reference
      width_shift_range=0.2,    # wrap
      height_shift_range=0.2,    # wrap
      shear_range=0.2,       # Add shear transformation
      brightness_range=(0.7, 1.3), # Wider brightness adjustment - reference 0.3
      horizontal_flip=True,
      fill_mode='nearest'
    )
    
    
    # Counts
    current_count = len(os.listdir(input_dir))
    target_count = 57199
    required_augmented_count = target_count - current_count
    
    print(f"Original negatives: {current_count}")
    print(f"Required augmented images: {required_augmented_count}")
    
    # augmenting ...
    augmented_count = 0
    max_augmentations_per_image = 10 #I used 5 and 10, this dataset was generated with 10
    
    for img_file in os.listdir(input_dir):
      img_path = os.path.join(input_dir, img_file)
      img = load_img(img_path, target_size=(480, 480)) # 480 by 480 referring to reference.
      img_array = img_to_array(img)
      img_array = img_array.reshape((1,) + img_array.shape)
    
      # Generate multiple augmentations per image
      i = 0
      for batch in datagen.flow(
        img_array,
        batch_size=1,
        save_to_dir=output_dir,
        save_prefix='aug',
        save_format='jpeg'
      ):
        i += 1
        augmented_count += 1
        if i >= max_augmentations_per_image:
          break
        if augmented_count >= required_augmented_count:
          break
    
      if augmented_count >= required_augmented_count:
        break
    

    I tried using different max_augmentations_per_image, or without setting this parameter; both ways generated augmented data (around 9,000) ...

    positive_balanced: ```python random.seed(42)

    Total negative samples

    target_count = 20579

    all_positive_images = os.listdir(positive_dir) selected_positive_images = random.sample(all_positive_images, target_count) ```

  2. COTS v NotCOTS Cropped Crown of Thorns Dataset

    • kaggle.com
    zip
    Updated Feb 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Teboul (2022). COTS v NotCOTS Cropped Crown of Thorns Dataset [Dataset]. https://www.kaggle.com/datasets/alexteboul/binary-cropped-crown-of-thorns-dataset/data
    Explore at:
    zip(49683724 bytes)Available download formats
    Dataset updated
    Feb 5, 2022
    Authors
    Alex Teboul
    Description

    Context

    This data comes entirely from the TensorFlow - Help Protect the Great Barrier Reef competition and should not be used outside of the competition! I do not own these images and to the extent possible want to ensure this complies with terms of the competition - I believe it does. All users/viewers of this dataset should adhere to the terms&conditions of the competition.

    I wanted an easily accessible repository of the cots images and not cots images to help with data augmentation and possibly improving the models in other ways. In the spirit of the competition I thought it made the most sense to make this available to the other competitors.

    Content

    This notebook was used to pre-process / create this dataset: Cropped Crown of Thorns Dataset Builder. It walks through the steps in a readable way.

    About the dataset: * This dataset contains an equal number (11,898) of images of COTS and Not Cots .jpg images. * These images come from cropping out the bounding box regions from each video frame in the competition. * Use this for data augmentation * Alternatively, if you're just getting started, try building binary classifiers for COTS vs. Not COTS if you want to build up the skill to create more complicated object detection models.

    Acknowledgements

    This comes directly from the TensorFlow - Help Protect the Great Barrier Reef](https://www.kaggle.com/c/tensorflow-great-barrier-reef) competition. Alternative citations include:

    Liu, J., Kusy, B., Marchant, R., Do, B., Merz, T., Crosswell, J., ... & Malpani, M. (2021). The CSIRO Crown-of-Thorn Starfish Detection Dataset. arXiv preprint arXiv:2111.14311.

    Inspiration

    See Notebook used to build this dataset here: Cropped Crown of Thorns Dataset Builder

  3. Rice Classification System with tensorflow

    • kaggle.com
    zip
    Updated Aug 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seyed Arman Hossaini (2025). Rice Classification System with tensorflow [Dataset]. https://www.kaggle.com/datasets/seyedarmanhossaini/ricepro
    Explore at:
    zip(227207926 bytes)Available download formats
    Dataset updated
    Aug 14, 2025
    Authors
    Seyed Arman Hossaini
    Description

    Jasmine, and Karacadag. The images are organized into separate folders for each class, making it suitable for supervised image classification tasks.

    Number of classes: 5

    Class names: Arborio, Basmati, Ipsala, Jasmine, Karacadag

    Image size: 128x128 (resized for modeling)

    Total images per class: 15,000

    Dataset split:

    Training: 70%

    Validation: 15%

    Test: 15%

    The dataset can be used to train convolutional neural networks (CNNs) for rice variety classification. It supports applications in agriculture, food quality control, and AI-powered crop monitoring. Data augmentation techniques have been applied during model training to improve robustness.

  4. Dice Images

    • kaggle.com
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash Srivastava (2022). Dice Images [Dataset]. https://www.kaggle.com/datasets/yashsrivastava51213/dice-images
    Explore at:
    zip(1317193 bytes)Available download formats
    Dataset updated
    Jan 9, 2022
    Authors
    Yash Srivastava
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    There is no story behind this dataset, I just felt that I should also have a dataset ๐Ÿ˜ฌ .

    About the Dataset.

    The dataset contains top view of dice digits which can be used as an alternative to the MNIST dataset for digit recognition, a benchmark dataset for classification.

    The images currently are only 120 and attempts to augment the data have already been made through the Tensorflow data augmentation pipeline, which further increased the dataset to about 1600 images(with random rotations, crops amongst other operations)

    Image Type and Nomenclature

    For the small dataset that we have here, the images were made from just two dice. The images of the dice are resized to be similar to that of the MNIST dataset for testing results on the already present models.

    The images currently in the dataset are named as follows: {number}_{color of the dice**}_{transform angle}_{transformation direction*}

    Expectation

    My aim is that the dataset should be big enough so as to not cause overfitting. The dataset should also be diverse enough so that the model for which it is used is accurate.

    Albeit augmentation of the dataset is a way to increase the dataset size, original images are preferred for their variability amongst many variables that I might have neglected in my analysis.

    *if the direction is necessary, it is mentioned
    ** Although the images are converted to grayscale, the color of the dice might be feature that is required for some other analysis.

    Acknowledgements

    There is no one particularly that comes to mind, because each and every picture in this small dataset was manually edited by me, although I would like to help

    Inspiration

    The question that I have is whether this dataset can be used for Image Classification ? My take on this problem : GitHub Implementation

  5. h

    imagenet_sketch

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated May 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Songwei Ge (2024). imagenet_sketch [Dataset]. https://huggingface.co/datasets/songweig/imagenet_sketch
    Explore at:
    Dataset updated
    May 25, 2024
    Authors
    Songwei Ge
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    ImageNet-Sketch data set consists of 50000 images, 50 images for each of the 1000 ImageNet classes. We construct the data set with Google Image queries "sketch of _", where _ is the standard class name. We only search within the "black and white" color scheme. We initially query 100 images for every class, and then manually clean the pulled images by deleting the irrelevant images and images that are for similar but different classes. For some classes, there are less than 50 images after manually cleaning, and then we augment the data set by flipping and rotating the images.

  6. c

    Data supporting "Evaluating the Usability of Microgestures for Text Editing...

    • repository.cam.ac.uk
    zip
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Xiang; He, Wei; Kristensson, Per Ola (2025). Data supporting "Evaluating the Usability of Microgestures for Text Editing Tasks in Virtual Reality" [Dataset]. http://doi.org/10.17863/CAM.115757
    Explore at:
    zip(37833332 bytes)Available download formats
    Dataset updated
    Sep 22, 2025
    Dataset provided by
    University of Cambridge
    Apollo
    Authors
    Li, Xiang; He, Wei; Kristensson, Per Ola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Origin of the Dataset

    This dataset originates from a research project investigating microgesture-based text editing in virtual reality (VR). The dataset was collected as part of an evaluation of the MicroGEXT system, which enables precise and efficient text editing using small, subtle hand movements. The research aims to explore lightweight, ergonomic alternatives to traditional mid-air gesture interactions.

    1. Data Collection Methods โ€ข Hardware: The dataset was collected using the Meta Quest Pro VR headset, utilizing its XR Hand Tracking package to capture hand skeleton data at 72 Hz. โ€ข Participants: 10 participants were recruited for gesture elicitation and evaluation. โ€ข Procedure:

      1. Participants interacted with a VR text-editing application that mapped microgestures to common editing functions.
      2. Before data collection, participants viewed a demonstration video to understand each gesture.
      3. Each participant performed each gesture 20 times to ensure data consistency.
      4. Static gestures were clipped to 2 seconds, while dynamic gestures were recorded in 5-second clips to capture complete motion sequences.
      5. Swipe gestures were segmented into sub-states (0โ€“3) for granular phase analysis, with each frame assigned a sub-state label.
    2. Technical & Non-Technical Information for Reusability โ€ข The dataset is suitable for: โ€ข Gesture recognition research (static/dynamic gestures, sub-state segmentation). โ€ข Human-computer interaction (HCI) studies focusing on XR input methods. โ€ข Machine learning applications, including deep learning-based gesture classification. โ€ข Reuse Considerations: โ€ข Compatible with Unityโ€™s XR Hand Tracking package and Python-based deep learning frameworks (e.g., PyTorch, TensorFlow). โ€ข Includes data augmentation scripts for expanding training datasets. โ€ข The Null class helps mitigate false activations in real-time applications.

  7. Smart Wardrobe Clothing Dataset

    • kaggle.com
    zip
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hizkia Siregar (2025). Smart Wardrobe Clothing Dataset [Dataset]. https://www.kaggle.com/datasets/hizkiasiregar/smart-wardrobe-clothing-dataset
    Explore at:
    zip(299723916 bytes)Available download formats
    Dataset updated
    Sep 1, 2025
    Authors
    Hizkia Siregar
    Description

    This dataset was created to support machine learning research in clothing classification, particularly for smart wardrobe and laundry applications. Inspired by the digital wardrobe concept popularized in media such as Clueless (1995), the dataset contains three primary categories of clothing items: - Tops: t-shirts, button-up shirts, sweaters, hoodies, and other upper garments. - Bottoms: jeans, shorts, formal pants, long trousers, and other lower garments. - Socks: long socks and short socks photographed in pairs and individually.

    All images were self-collected using an iPhone camera in HEIC format and later converted to JPG/PNG. Backgrounds were removed manually using Canva and programmatically using Rembg with the Uยฒ-Net model. Augmentation techniques (rotation, flipping, cropping, brightness and contrast adjustments) were applied to increase dataset diversity. - Raw images: 521 (200 tops, 200 bottoms, 121 socks) - Final images after augmentation: ~1,900 (balanced across all classes)

    This dataset can be used for experiments in: - Image classification - Data augmentation pipelines - Transfer learning (e.g., Teachable Machine, TensorFlow, PyTorch) - Applied computer vision in smart wardrobe and smart home systems

  8. research on soyabean leaves

    • figshare.com
    pdf
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prajwal Bawankar (2025). research on soyabean leaves [Dataset]. http://doi.org/10.6084/m9.figshare.28797590.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Prajwal Bawankar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project focuses on developing an intelligent system capable of detecting and classifying diseases in plant leaves using image processing and deep learning techniques. Leveraging Convolutional Neural Networks (CNNs) and transfer learning, the system analyzes leaf images to identify signs of infection with high accuracy. It supports smart agriculture by enabling early disease detection, reducing crop loss, and providing actionable insights to farmers. The project uses datasets such as PlantVillage and integrates frameworks like TensorFlow, Keras, and PyTorch. The model can be deployed as a web or mobile application, offering a real-time solution for plant health monitoring in agricultural environments.

  9. WastePro

    • kaggle.com
    zip
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naman Jain (2025). WastePro [Dataset]. https://www.kaggle.com/datasets/namanjain001/comprehensive-solid-waste-image-dataset
    Explore at:
    zip(2461598517 bytes)Available download formats
    Dataset updated
    Apr 28, 2025
    Authors
    Naman Jain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The WastePro dataset is a comprehensive, custom-curated collection designed for broad-spectrum waste classification using deep learning. It contains high-quality images of solid waste items spanning a wide range of categories, such as organic, plastic, metal, glass, e-waste, paper, cardboard, textiles, rubber, and more. Each image is labeled according to its waste type, enabling robust supervised learning for multi-class classification tasks.

    Key Features: - Diverse Categories: WastePro covers 9 distinct waste classes, ensuring representation of both common (organic, plastic, metal, glass) and less common (e-waste, textiles, rubber) waste types. This diversity supports the development of models capable of real-world, context-rich waste recognition. - Image Quality & Structure: Images are RGB and standardized in size (commonly 224x224 pixels) to facilitate compatibility with modern convolutional neural networks. The dataset is organized in a directory structure suitable for direct loading with TensorFlow and Keras utilities. - Data Augmentation Ready: The dataset supports augmentation techniques such as flipping, rotation, zoom, and contrast adjustments, which are essential for increasing model robustness and generalization to unseen waste images. - Real-World Context: Images are collected from multiple sources and environments, including municipal solid waste streams, recycling centers, and public datasets. This ensures that models trained on WastePro are applicable to practical waste management scenarios. Applications: WastePro is ideal for training and benchmarking deep learning models for automated waste sorting, recycling facility automation, smart bins, and environmental monitoring. Its comprehensive coverage and high-quality labeling make it a strong foundation for advancing research and deployment in intelligent waste management systems.

    WastePro sets a new standard for waste classification datasets by combining curated intelligence, broad category coverage, and deployment-ready design.

  10. O

    PlantVillage

    • opendatalab.com
    • tensorflow.org
    • +2more
    zip
    Updated Apr 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna University (2019). PlantVillage [Dataset]. https://opendatalab.com/OpenDataLab/PlantVillage
    Explore at:
    zip(1803914962 bytes)Available download formats
    Dataset updated
    Apr 18, 2019
    Dataset provided by
    Anna University
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In this data-set, 39 different classes of plant leaf and background images are available. The data-set containing 61,486 images. We used six different augmentation techniques for increasing the data-set size. The techniques are image flipping, Gamma correction, noise injection, PCA color augmentation, rotation, and Scaling.

  11. Audiomentations

    • kaggle.com
    zip
    Updated Apr 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    atfujita (2022). Audiomentations [Dataset]. https://www.kaggle.com/datasets/atsunorifujita/audiomentations
    Explore at:
    zip(62619 bytes)Available download formats
    Dataset updated
    Apr 22, 2022
    Authors
    atfujita
    Description

    A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

    Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

  12. Animal Species Classification - V3

    • kaggle.com
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepNets (2023). Animal Species Classification - V3 [Dataset]. https://www.kaggle.com/datasets/utkarshsaxenadn/animal-image-classification-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DeepNets
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Idea

    The vision behind creating this dataset is to have a data set for classifying animal species. A lot of animal species can be included in this data set, which is why it gets revised regularly. This will help to create a machine-learning model that can accurately classify animal species.

    Class Distribution

    This is Animal Classification Data-set made for the Multi-Class Image Recognition Task. The dataset contains 15 Classes, these classes are :

    1. Beetle
    2. Butterfly
    3. Cat
    4. Cow
    5. Dog
    6. Elephant
    7. Gorilla
    8. Hippo
    9. Lizard
    10. Monkey
    11. Mouse
    12. Panda
    13. Spider
    14. Tiger
    15. Zebra

    Data Distribution

    The data is split into 6 directories:

    Interesting Data * As the name suggests, this folder contains 5 interesting images per class. These are called Interesting images because it will be fascinating to know which class the model allocates to these shots. Based on the model's prediction, we can understand the model's understanding of that class.

    Testing Data * This folder is filled with a random number of images per class. As the name indicates this folder is purposefully made to incorporate testing images, that is images on which the model will be tested after training.

    TFRecords Data * This folder contains the data in Tensorflow records format. All the images present in TF records format have already been resized to 256 x 256 pixels and normalized.

    Train Augmented * This time, an additional train augmented data is added to the data set. As per the name, this directory contains augmented images per class. 5 augmented images per original image, in total each class has 10,000 augmented images. This is done to increase the data set size because, With the increase in the total number of classes, the model complexity increases. And thus we require more data to train the model. The best way to get more data is data augmentation. It is highly recommended to shuffle the data before/after loading it.

    Training Images * Each class is filled with 2000 images for training purpose. This is the data that is used for training the model. In this case, all the images are resized to 256 by 256 pixels and normalized to have the input pixel range of 0 to 1.

    Validation Images * This folder contains 100/200 images per class, this is intentionally created for validation purposes. Images from this directory will be used at the time of training for validating the model's performance.

    DeepNets

  13. breast histopathology images modified IDC ONLY

    • kaggle.com
    zip
    Updated Jun 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ALovesToCode (2023). breast histopathology images modified IDC ONLY [Dataset]. https://www.kaggle.com/datasets/alovestocode/breast-histopathology-images-modified
    Explore at:
    zip(1656559021 bytes)Available download formats
    Dataset updated
    Jun 10, 2023
    Authors
    ALovesToCode
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    What is changed?

    instead of having individual patient folder, 2 folder namely 0(Non-IDC) and 1(IDC) have created to contain the images for easy loading to memory or TensorFlow dataset implementations.

    Obtained from https://www.kaggle.com/datasets/paultimothymooney/breast-histopathology-images

    Just a advice, use Data augmentation or similar technique since Data is imbalanced. About Dataset Context Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide.

    Content The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative and 78,786 IDC positive). Each patchโ€™s file name is of the format: u_xX_yY_classC.png โ€” > example 10253_idx5_x1351_y1101_class0.png . Where u is the patient ID (10253_idx5), X is the x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.

    Acknowledgements The original files are located here: http://gleason.case.edu/webdata/jpi-dl-tutorial/IDC_regular_ps50_idx5.zip Citation: https://www.ncbi.nlm.nih.gov/pubmed/27563488 and http://spie.org/Publications/Proceedings/Paper/10.1117/12.2043872

    Inspiration Breast cancer is the most common form of cancer in women, and invasive ductal carcinoma (IDC) is the most common form of breast cancer. Accurately identifying and categorizing breast cancer subtypes is an important clinical task, and automated methods can be used to save time and reduce error.

  14. pytorch_image_models

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HyeongChan Kim (2025). pytorch_image_models [Dataset]. https://www.kaggle.com/datasets/kozistr/pytorch-image-models
    Explore at:
    zip(3469394 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    HyeongChan Kim
    Description

    PyTorch Image Models

    Sponsors

    A big thank you to my GitHub Sponsors for their support!

    In addition to the sponsors at the link above, I've received hardware and/or cloud resources from * Nvidia (https://www.nvidia.com/en-us/) * TFRC (https://www.tensorflow.org/tfrc)

    I'm fortunate to be able to dedicate significant time and money of my own supporting this and other open source projects. However, as the projects increase in scope, outside support is needed to continue with the current trajectory of hardware, infrastructure, and electricty costs.

    What's New

    Aug 18, 2021

    • Optimizer bonanza!
      • Add LAMB and LARS optimizers, incl trust ratio clipping options. Tweaked to work properly in PyTorch XLA (tested on TPUs w/ timm bits branch)
      • Add MADGRAD from FB research w/ a few tweaks (decoupled decay option, step handling that works with PyTorch XLA)
      • Some cleanup on all optimizers and factory. No more .data, a bit more consistency, unit tests for all!
      • SGDP and AdamP still won't work with PyTorch XLA but others should (have yet to test Adabelief, Adafactor, Adahessian myself).
    • EfficientNet-V2 XL TF ported weights added, but they don't validate well in PyTorch (L is better). The pre-processing for the V2 TF training is a bit diff and the fine-tuned 21k -> 1k weights are very sensitive and less robust than the 1k weights.
    • Added PyTorch trained EfficientNet-V2 'Tiny' w/ GlobalContext attn weights. Only .1-.2 top-1 better than the SE so more of a curiosity for those interested.

    July 12, 2021

    July 5-9, 2021

    • Add efficientnetv2_rw_t weights, a custom 'tiny' 13.6M param variant that is a bit better than (non NoisyStudent) B3 models. Both faster and better accuracy (at same or lower res)
      • top-1 82.34 @ 288x288 and 82.54 @ 320x320
    • Add SAM pretrained in1k weight for ViT B/16 (vit_base_patch16_sam_224) and B/32 (vit_base_patch32_sam_224) models.
    • Add 'Aggregating Nested Transformer' (NesT) w/ weights converted from official Flax impl. Contributed by Alexander Soare.
      • jx_nest_base - 83.534, jx_nest_small - 83.120, jx_nest_tiny - 81.426

    June 23, 2021

    • Reproduce gMLP model training, gmlp_s16_224 trained to 79.6 top-1, matching paper. Hparams for this and other recent MLP training here

    June 20, 2021

    • Release Vision Transformer 'AugReg' weights from How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
      • .npz weight loading support added, can load any of the 50K+ weights from the AugReg series
      • See example notebook from official impl for navigating the augreg weights
      • Replaced all default weights w/ best AugReg variant (if possible). All AugReg 21k classifiers work.
      • Highlights: vit_large_patch16_384 (87.1 top-1), vit_large_r50_s32_384 (86.2 top-1), vit_base_patch16_384 (86.0 top-1)
      • vit_deit_* renamed to just deit_*
      • Remove my old small model, replace with DeiT compatible small w/ AugReg weights
    • Add 1st training of my gmixer_24_224 MLP /w GLU, 78.1 top-1 w/ 25M params.
    • Add weights from official ResMLP release (https://github.com/facebookresearch/deit)
    • Add eca_nfnet_l2 weights from my 'lightweight' series. 84.7 top-1 at 384x384.
    • Add distilled BiT 50x1 student and 152x2 Teacher weights from Knowledge distillation: A good teacher is patient and consistent
    • NFNets and ResNetV2-BiT models work w/ Pytorch XLA now
      • weight standardization uses F.batch_norm instead of std_mean (std_mean wasn't lowered)
      • eps values adjusted, will be slight differences but should be quite close
    • Improve test coverage and classifier interface of non-conv (vision transformer and mlp) models ...
  15. Sign Language Dataset - 5 Essential Phrases

    • kaggle.com
    zip
    Updated Oct 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Hamdey (2025). Sign Language Dataset - 5 Essential Phrases [Dataset]. https://www.kaggle.com/datasets/mohamedhamdey/5-basic-signes
    Explore at:
    zip(22115208 bytes)Available download formats
    Dataset updated
    Oct 25, 2025
    Authors
    Mohamed Hamdey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Sign Language Recognition Dataset - 5 Essential Phrases

    ๐ŸŽฏ Overview

    This dataset contains hand gesture images for sign language recognition, focusing on 5 commonly used phrases. The images are preprocessed, cropped, and ready for training deep learning models for real-time sign language detection applications.

    ๐Ÿ“Š Dataset Statistics

    • Total Images: ~1,000 images
    • Number of Classes: 5
    • Image Format: JPG
    • Image Size: 224ร—224 pixels (standardized)
    • Split: 75% Train / 15% Validation / 10% Test

    ๐Ÿท๏ธ Classes

    Class IDMeaningDescription
    0YesAffirmative gesture
    1NoNegative gesture
    2I Love YouExpression of affection
    3HelloGreeting gesture
    4Thank YouGratitude expression

    ๐Ÿ“‚ Dataset Structure

    data_final/
    โ”œโ”€โ”€ train/
    โ”‚  โ”œโ”€โ”€ 0/  # Yes (~150 images)
    โ”‚  โ”œโ”€โ”€ 1/  # No (~150 images)
    โ”‚  โ”œโ”€โ”€ 2/  # I Love You (~150 images)
    โ”‚  โ”œโ”€โ”€ 3/  # Hello (~150 images)
    โ”‚  โ””โ”€โ”€ 4/  # Thank You (~150 images)
    โ”œโ”€โ”€ val/
    โ”‚  โ”œโ”€โ”€ 0/
    โ”‚  โ”œโ”€โ”€ 1/
    โ”‚  โ”œโ”€โ”€ 2/
    โ”‚  โ”œโ”€โ”€ 3/
    โ”‚  โ””โ”€โ”€ 4/
    โ””โ”€โ”€ test/
      โ”œโ”€โ”€ 0/
      โ”œโ”€โ”€ 1/
      โ”œโ”€โ”€ 2/
      โ”œโ”€โ”€ 3/
      โ””โ”€โ”€ 4/
    

    ๐ŸŽจ Data Collection & Preprocessing

    Collection Process:

    • Images collected using webcam in controlled environment
    • Hand gestures detected using MediaPipe hand tracking
    • Multiple angles, positions, and lighting conditions
    • Various hand positions and distances from camera

    Preprocessing:

    • Hand region detection using MediaPipe
    • Automatic cropping to hand bounding box
    • Resized to 224ร—224 pixels
    • Padding added around hand region
    • Quality control and manual cleaning performed

    ๐Ÿ”ง Image Characteristics

    • Resolution: 224ร—224 pixels
    • Color: RGB
    • Background: Various (natural backgrounds)
    • Lighting: Mixed (natural and artificial)
    • Hand Orientation: Multiple angles
    • Distance: Varied (close, medium, far)

    ๐Ÿ’ก Use Cases

    This dataset is suitable for:

    1. Sign Language Recognition Models

      • Real-time gesture recognition
      • Sign-to-speech applications
      • Accessibility tools
    2. Computer Vision Research

      • Hand gesture classification
      • Transfer learning experiments
      • Mobile ML applications
    3. Educational Projects

      • Learning deep learning basics
      • Building gesture recognition systems
      • Prototyping accessibility solutions

    ๐Ÿš€ Quick Start

    Load Data with TensorFlow:

    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    
    datagen = ImageDataGenerator(rescale=1./255)
    
    train_gen = datagen.flow_from_directory(
      'data_final/train',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    
    val_gen = datagen.flow_from_directory(
      'data_final/val',
      target_size=(224, 224),
      batch_size=32,
      class_mode='categorical'
    )
    

    Load Data with PyTorch:

    from torchvision import datasets, transforms
    
    transform = transforms.Compose([
      transforms.Resize((224, 224)),
      transforms.ToTensor(),
      transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    
    train_dataset = datasets.ImageFolder('data_final/train', transform=transform)
    val_dataset = datasets.ImageFolder('data_final/val', transform=transform)
    

    ๐Ÿ“ˆ Baseline Performance

    Using transfer learning with MobileNetV2/EfficientNetB0: - Expected Accuracy: 90-97% - Training Time: 20-40 minutes (GPU) - Model Size: ~15 MB

    ๐ŸŽ“ Recommended Augmentation

    For better generalization, use these augmentation techniques: python train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=25, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15, zoom_range=0.2, horizontal_flip=True, brightness_range=[0.7, 1.3] )

    โš ๏ธ Limitations

    • Limited vocabulary: Only 5 signs (not comprehensive)
    • Single person: Images from one individual (limited diversity)
    • Static gestures: No motion-based signs
    • Controlled environment: May need adaptation for real-world scenarios
    • Hand dominance: Mix of left and right hands

    ๐Ÿ”ฎ Future Improvements

    • Expand to 20+ common signs
    • Include multiple signers (diverse skin tones, ages, genders)
    • Add motion-based gestures (video data)
    • Regional sign language variations
    • More challenging backgrounds

    ๐Ÿ“œ Citation

    If you use this dataset in your research or project, please cite: @dataset{sign_language_5phrases_2025, title={Sign Language Recognition Dataset - 5 Essential Phrases}, author={[Your Name]}, year={2025}, publisher={Kaggle}, url={[Dataset URL]} }

    ๐Ÿ“„ License

    This dataset is released under [Choose one]: - CC BY 4.0 (Attribution) - Recommended - CC BY-SA 4.0 (Attribution-ShareAlike) - CC0 1.0 (Public Domain)

    ๐Ÿค Acknowledgments

    • MediaPipe by Google for hand tracking
    • TensorFlow/Keras for deep learning fr...
  16. Fast Food Classification Dataset - V2 | 20k Images

    • kaggle.com
    zip
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepNets (2022). Fast Food Classification Dataset - V2 | 20k Images [Dataset]. https://www.kaggle.com/datasets/utkarshsaxenadn/fast-food-classification-dataset/discussion
    Explore at:
    zip(860736636 bytes)Available download formats
    Dataset updated
    Dec 6, 2022
    Authors
    DeepNets
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Version 2

    Version 2 extends the version 1 of the fastfood classification data set and introduces some new classes with new images. These new classes are : * Baked Potato * Crispy Chicken * Fries * Taco * Taquito

    The data set is divided into 4 parts, the Tensorflow Records, Training DataValidation Data** and Testing Data. The tensorflow records directory is further divided into 3 parts, the Train, Valid and Test. These images are resized to 256 by 256 pixels. No other augmentation is applied. While loading the tensorflow records files, you can apply any augmentation you want.

    • Train : Contains 15,000 training images, with each class having 1,500 images.

    • Valid : Contains 3,500 validation images, with each class having 400 images.

    • Test : Contains 1,500 validation images, with each class having 100/200 images.

    • Unlike the Tensorflow records data, the Training data, validation data and testing data contains direct images. These are raw images. So any kind of augmentation, and specially resizing, can be applied on them.

      • Training Data : This directory contains 5 subdirectories. Each directory representing a class. Each class have 1,500 training images.

      • Validation Data : This directory also contains 10 subdirectories. Each directory representing a class. Each **class have 400 images for monitoring model's performance.

      • Testing Data : This directory also contains 10 subdirectories. Each directory representing a class. Each **class have 100 /200 images for evaluating model's performance.

    Version 1

    This is Fast Food Classification data set containing images of 5 different types of fast food. Each directory represents a class, and each class represents a food type. The Classes are : * Burger * Donut * Hot Dog * Pizza * Sandwich

    The data set is divided into 3 parts, the Tensorflow records, Training data set and Validation data set. * The tensorflow records directory is further divided into 2 parts, the training images and the validation images.These images are resized to 256 by 256 pixels. No other augmentation is applied. While loading the tensorflow records files, you can apply any augmentation you want. * Training Images : Contains 7,500 training images, with each class having 1,500 images. * Validation Images : Contains 2,500 validation images, with each class having 500 images.

    • Unlike the Tensorflow records data, the Training data and validation data contains direct images. These are raw images. So any kind of augmentation, and specially resizing, can be applied on them.
      • Training Data : This directory contains 5 subdirectories. Each directory representing a class. Each class have 1,500 training images.
      • Validation Data : This directory also contains 5 subdirectories. Each directory representing a class. Each **class have 500 images for monitoring model's performance.
  17. Pre Trained Model For Emotion Detection

    • kaggle.com
    Updated Jan 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Singh (2024). Pre Trained Model For Emotion Detection [Dataset]. http://doi.org/10.34740/kaggle/ds/4374471
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2024
    Dataset provided by
    Kaggle
    Authors
    Abhishek Singh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    FER2013 (Facial Expression Recognition 2013) dataset is a widely used dataset for training and evaluating facial expression recognition models. Here are key details about the FER2013 dataset:

    Overview:

    FER2013 is a dataset designed for facial expression recognition tasks, particularly the classification of facial expressions into seven different emotion categories. The dataset was introduced for the Emotion Recognition in the Wild (EmotiW) Challenge in 2013.

    Emotion Categories:

    The dataset consists of images labeled with seven emotion categories: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.

    Image Size:

    Each image in the FER2013 dataset is grayscale and has a resolution of 48x48 pixels.

    Number of Images:

    The dataset contains a total of 35,887 labeled images, with approximately 5,000 images per emotion category. Partitioning:

    FER2013 is often divided into training, validation, and test sets. The original split has 28,709 images for training, 3,589 images for validation, and 3,589 images for testing.

    Usage in Research:

    FER2013 has been widely used in research for benchmarking and training facial expression recognition models, particularly deep learning models. It provides a standard dataset for evaluating the performance of models on real-world facial expressions. Challenges:

    The FER2013 dataset is known for its relatively simple and posed facial expressions. In real-world scenarios, facial expressions can be more complex and spontaneous, and there are datasets addressing these challenges.

    Challenges and Criticisms:

    Some criticisms of the dataset include its relatively small size, limited diversity in facial expressions, and the fact that some expressions (e.g., "Disgust") are challenging to recognize accurately.

    This pre trained machine model implements a Convolutional Neural Network (CNN) for emotion detection using the TensorFlow and Keras frameworks. The model architecture includes convolutional layers, batch normalization, and dropout for effective feature extraction and classification. The training process utilizes an ImageDataGenerator for data augmentation, enhancing the model's ability to generalize to various facial expressions.

    Key Steps:

    Model Training: The CNN model is trained on an emotion dataset using an ImageDataGenerator for dynamic data augmentation. Training is performed over a specified number of epochs with a reduced batch size for efficient learning.

    Model Checkpoint: ModelCheckpoint is employed to save the best-performing model during training, ensuring that the most accurate model is retained.

    Save Model and Memory Cleanup: The trained model is saved in both HDF5 and JSON formats. Memory is efficiently managed by deallocating resources, clearing the Keras session, and performing garbage collection.

  18. Vehicle Detection Image Dataset

    • kaggle.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parisa Karimi Darabi (2024). Vehicle Detection Image Dataset [Dataset]. https://www.kaggle.com/datasets/pkdarabi/vehicle-detection-image-dataset
    Explore at:
    zip(274761684 bytes)Available download formats
    Dataset updated
    Apr 9, 2024
    Authors
    Parisa Karimi Darabi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vehicle Detection Image Dataset

    Introduction

    Welcome to the Vehicle Detection Image Dataset! This dataset is meticulously curated for object detection and tracking tasks, with a specific focus on vehicle detection. It serves as a valuable resource for researchers, developers, and enthusiasts seeking to advance the capabilities of computer vision systems.

    Objective

    The primary aim of this dataset is to facilitate precise object detection tasks, particularly in identifying and tracking vehicles within images. Whether you are engaged in academic research, developing commercial applications, or exploring the frontiers of computer vision, this dataset provides a solid foundation for your projects.

    Preprocessing and Augmentation

    Both versions of the dataset undergo essential preprocessing steps, including resizing and orientation adjustments. Additionally, the Apply_Grayscale version undergoes augmentation to introduce grayscale variations, thereby enriching the dataset and improving model robustness.

    1. Apply_Grayscale

    • This version comprises grayscale images and is further augmented to enhance the diversity of training data.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2F4f23bd8094c892d1b6986c767b42baf4%2Fv2.png?generation=1712264632232641&alt=media" alt="">

    2. No_Apply_Grayscale

    • This version includes images without applying grayscale augmentation.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14850461%2Fbfb10eb2a4db31a62eb4615da824c387%2Fdetails_v1.png?generation=1712264660626280&alt=media" alt="">

    Data Formats

    To ensure compatibility with a wide range of object detection frameworks and tools, each version of the dataset is available in multiple formats:

    1. COCO
    2. YOLOv8
    3. YOLOv9
    4. TensorFlow

    These formats facilitate seamless integration into various machine learning frameworks and libraries, empowering users to leverage their preferred development environments.

    Real-Time Object Detection

    In addition to image datasets, we also provide a video for real-time object detection evaluation. This video allows users to test the performance of their models in real-world scenarios, providing invaluable insights into the effectiveness of their detection algorithms.

    Getting Started

    To begin exploring the Vehicle Detection Image Dataset, simply download the version and format that best suits your project requirements. Whether you are an experienced practitioner or just embarking on your journey in computer vision, this dataset offers a valuable resource for advancing your understanding and capabilities in object detection and tracking tasks.

    Citation

    If you utilize this dataset in your work, we kindly request that you cite the following:

    Parisa Karimi Darabi. (2024). Vehicle Detection Image Dataset: Suitable for Object Detection and tracking Tasks. Retrieved from https://www.kaggle.com/datasets/pkdarabi/vehicle-detection-image-dataset/

    Feedback and Contributions

    I welcome feedback and contributions from the Kaggle community to continually enhance the quality and usability of this dataset. Please feel free to reach out if you have suggestions, questions, or additional data and annotations to contribute. Together, we can drive innovation and progress in computer vision.

  19. Landmarks Dataset

    • kaggle.com
    zip
    Updated Apr 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kayvan Shah (2023). Landmarks Dataset [Dataset]. https://www.kaggle.com/datasets/kayvanshah/landmarks-dataset
    Explore at:
    zip(191065672 bytes)Available download formats
    Dataset updated
    Apr 23, 2023
    Authors
    Kayvan Shah
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The dataset consists of images of famous (or not-so-famous) landmarks. The collection is organized into a two-level hierarchy structure. The first level is the categories for the landmarks, and the second level is the individual landmarks. There are 6 categories, and categories are: 1. Gothic 2. Modern 3. Mughal 4. Neoclassical 5. Pagodas 6. Pyramids

    For each category, there are 5 landmarks, for a total of 30 landmarks. Each landmark has 14 images.

    Tasks:

    • This group project is comprised of two machine-learning tasks:
    • Category classification: predict the category names of images
    • Landmark classification: predict the landmark names of images

    The landmarks dataset is too small to train convolutional neural networks (CNNs) from scratch. The resulting network will overfit the data. Instead, use transfer learning by reusing part of a pre-trained CNN. In transfer learning, instead of training the neural network starting from random weights, the weights for the lower parts of the network are taken from a pre-trained network. Only the higher parts of the network will have to be learned. Chapter 14 of Gรฉron discusses how to apply pre-trained models for transfer learning.

    For this group project, the only allowed pre-trained networks are EfficientNetB0 and VGG16, which are smaller CNNs. The objective of this restriction is to avoid penalizing groups that do not have access to powerful machines and/or machines with GPUs. Groups are allowed to use Google Colab with GPUs to train the models, but be aware of resource usage limitations.

    Data augmentation is another way to overcome the problem of small datasets. Keras/TensorFlow provides various image manipulation functions (hitps://www.tensorflow.org/api_docs/python/tf/image) that can be used to generate additional images. Refer to Lecture 9 slides and Chapter 14 of Gรฉron.

    Yet another way to overcome the small dataset problem is experimenting with various ways of combining the models for the two tasks. It is possible to train two distinct models, one for category classification and one for landmark classification. But would landmark classification benefit from knowing the output of classification? Or vice versa?

    Code and Model Submission:

    • The details of the submission will be provided later. We are in the process of setting up a Vocareum site that will allow you to run your model against part of the holdout test images.
    • You are strongly encouraged to use Keras/TensorFlow.
  20. Grape-Instance-Segmentation-For-Viticulture

    • kaggle.com
    zip
    Updated Aug 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaannarik (2025). Grape-Instance-Segmentation-For-Viticulture [Dataset]. https://www.kaggle.com/datasets/kaanarikkk/grape-instance-segmentation-for-viticulture
    Explore at:
    zip(241118192 bytes)Available download formats
    Dataset updated
    Aug 25, 2025
    Authors
    kaannarik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Grape-Instance-Segmentation-For-Viticulture

    In this dataset, using a unique dataset (FGVL Dataset) collected from Sultana seedless grape vineyards in the Aegean Region of Turkey, an example segmentation model has been developed to classify frost-damaged leaves and grape clusters at the pixel level. The dataset includes 418 frost-damaged grapes, 510 frost-damaged leaves, 395 healthy grapes, and 698 healthy leaves, collected after a severe frost event in April 2025 at a vineyard in Manisa. The im-ages were captured in high resolution under natural lighting conditions and manually labeled by experts.

    Instructions

    Participants must use the FGVL Dataset to develop deep learning models for instance segmentation of frost-damaged and healthy grape leaves and clusters.

    You are free to use any image processing or deep learning framework (e.g., YOLOv11, PyTorch, TensorFlow) and apply data augmentation, model tuning, and evaluation techniques.

    Submissions will be evaluated based on mAP@50 and mAP@50-95 metrics on the test set.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DD.Zh (2024). COVID Rearrange Dataset [Dataset]. https://www.kaggle.com/datasets/dadadazhang/covid-data-rearrange/versions/2
Organization logo

COVID Rearrange Dataset

Deep Learning Class Project

Explore at:
zip(47423691852 bytes)Available download formats
Dataset updated
Nov 21, 2024
Authors
DD.Zh
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The original dataset is from https://www.kaggle.com/datasets/andyczhao/covidx-cxr2

The data is separated based on the .txt file (see link) into positive and negative.

Data Augmentation Code

from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
  rescale=1./255,        # Normalize
  rotation_range=20,      # Rotation reference
  zoom_range=0.2,        # Zoom reference
  width_shift_range=0.2,    # wrap
  height_shift_range=0.2,    # wrap
  shear_range=0.2,       # Add shear transformation
  brightness_range=(0.7, 1.3), # Wider brightness adjustment - reference 0.3
  horizontal_flip=True,
  fill_mode='nearest'
)

# Counts
current_count = len(os.listdir(input_dir))
target_count = 57199
required_augmented_count = target_count - current_count

print(f"Original negatives: {current_count}")
print(f"Required augmented images: {required_augmented_count}")

# augmenting ...
augmented_count = 0
max_augmentations_per_image = 10 #I used 5 and 10, this dataset was generated with 10

for img_file in os.listdir(input_dir):
  img_path = os.path.join(input_dir, img_file)
  img = load_img(img_path, target_size=(480, 480)) # 480 by 480 referring to reference.
  img_array = img_to_array(img)
  img_array = img_array.reshape((1,) + img_array.shape)

  # Generate multiple augmentations per image
  i = 0
  for batch in datagen.flow(
    img_array,
    batch_size=1,
    save_to_dir=output_dir,
    save_prefix='aug',
    save_format='jpeg'
  ):
    i += 1
    augmented_count += 1
    if i >= max_augmentations_per_image:
      break
    if augmented_count >= required_augmented_count:
      break

  if augmented_count >= required_augmented_count:
    break

I tried using different max_augmentations_per_image, or without setting this parameter; both ways generated augmented data (around 9,000) ...

positive_balanced: ```python random.seed(42)

Total negative samples

target_count = 20579

all_positive_images = os.listdir(positive_dir) selected_positive_images = random.sample(all_positive_images, target_count) ```

Search
Clear search
Close search
Google apps
Main menu