100+ datasets found
  1. f

    CIFAR dataset classification experiment results according to the number of...

    • plos.figshare.com
    xls
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). CIFAR dataset classification experiment results according to the number of images used for data augmentation. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CIFAR dataset classification experiment results according to the number of images used for data augmentation.

  2. MRI Dementia Augmentation : No Data leak

    • kaggle.com
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Hema (2024). MRI Dementia Augmentation : No Data leak [Dataset]. https://www.kaggle.com/datasets/matthewhema/mri-dementia-augmentation-no-data-leak
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Matthew Hema
    Description

    The data consists of MRI images. The data has four classes of images. Separated the train test val = 70-15-15.

    Mild Demented Moderate Demented Non Demented Very Mild Demented

    Original Dataset accessed from https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset.

  3. d

    Data from: How many specimens make a sufficient training set for automated...

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James M. Mulqueeney; Alex Searle-Barnes; Anieke Brombacher; Marisa Sweeney; Anjali Goswami; Thomas H. G. Ezard (2025). How many specimens make a sufficient training set for automated three dimensional feature extraction? [Dataset]. http://doi.org/10.5061/dryad.1rn8pk12f
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    James M. Mulqueeney; Alex Searle-Barnes; Anieke Brombacher; Marisa Sweeney; Anjali Goswami; Thomas H. G. Ezard
    Description

    Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and ..., Data collection 50 planktonic foraminifera, comprising 4 Menardella menardii, 17 Menardella limbata, 18 Menardella exilis, and 11 Menardella pertenuis specimens, were used in our analyses (electronic supplementary material, figures S1 and S2). The taxonomic classification of these species was established based on the analysis of morphological characteristics observed in their shells. In this context, all species are characterised by lenticular, low trochosprial tests with a prominent keel [13]. Discrimination among these species is achievable, as M. limbata can be distinguished from its ancestor, M. menardii, by having a greater number of chambers and a smaller umbilicus. Moreover, M. exilis and M. pertenuis can be discerned from M. limbata by their thinner, more polished tests and reduced trochospirality. Furthermore, M. pertenuis is identifiable by a thin plate extending over the umbilicus and possessing a greater number of chambers in the final whorl compared to M. exilis [13]. The s..., , # Data from: How many specimens make a sufficient training set for automated three dimensional feature extraction?

    https://doi.org/10.5061/dryad.1rn8pk12f

    All computer code and final raw data used for this research work are stored in GitHub: https://github.com/JamesMulqueeney/Automated-3D-Feature-Extraction and have been archived within the Zenodo repository:Â https://doi.org/10.5281/zenodo.11109348.Â

    This data is the additional primary data used in each analysis. These include: CT Image Files, Manual Segmentation Files (use for training or analysis), Inputs and Outputs for Shape Analysis and an example .h5 file which can be used to practice AI segmentation.Â

    Description of the data and file structure

    The primary data is arranged into the following:

    1. Image_Files.zip: Foraminiferal CT data used in the analysis.Â
    2. **I...
  4. n

    Data from: New Deep Learning Methods for Medical Image Analysis and...

    • curate.nd.edu
    pdf
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengfei Gu (2024). New Deep Learning Methods for Medical Image Analysis and Scientific Data Generation and Compression [Dataset]. http://doi.org/10.7274/26156719.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    University of Notre Dame
    Authors
    Pengfei Gu
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    Medical image analysis is critical to biological studies, health research, computer- aided diagnoses, and clinical applications. Recently, deep learning (DL) techniques have achieved remarkable successes in medical image analysis applications. However, these techniques typically require large amounts of annotations to achieve satisfactory performance. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for medical image analysis while reducing annotation efforts? To address this problem, we have outlined two specific aims: (A1) Utilize existing annotations effectively from advanced models; (A2) extract generic knowledge directly from unannotated images.

    To achieve the aim (A1): First, we introduce a new data representation called TopoImages, which encodes the local topology of all the image pixels. TopoImages can be complemented with the original images to improve medical image analysis tasks. Second, we propose a new augmentation method, SAMAug-C, that lever- ages the Segment Anything Model (SAM) to augment raw image input and enhance medical image classification. Third, we propose two advanced DL architectures, kCBAC-Net and ConvFormer, to enhance the performance of 2D and 3D medical image segmentation. We also present a gate-regularized network training (GrNT) approach to improve multi-scale fusion in medical image segmentation. To achieve the aim (A2), we propose a novel extension of known Masked Autoencoders (MAEs) for self pre-training, i.e., models pre-trained on the same target dataset, specifically for 3D medical image segmentation.

    Scientific visualization is a powerful approach for understanding and analyzing various physical or natural phenomena, such as climate change or chemical reactions. However, the cost of scientific simulations is high when factors like time, ensemble, and multivariate analyses are involved. Additionally, scientists can only afford to sparsely store the simulation outputs (e.g., scalar field data) or visual representations (e.g., streamlines) or visualization images due to limited I/O bandwidths and storage space. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for scientific data generation and compression while reducing simulation and storage costs?

    To tackle this problem: First, we propose a DL framework that generates un- steady vector fields data from a set of streamlines. Based on this method, domain scientists only need to store representative streamlines at simulation time and recon- struct vector fields during post-processing. Second, we design a novel DL method that translates scalar fields to vector fields. Using this approach, domain scientists only need to store scalar field data at simulation time and generate vector fields from their scalar field counterparts afterward. Third, we present a new DL approach that compresses a large collection of visualization images generated from time-varying data for communicating volume visualization results.

  5. h

    mnist_augmented

    • huggingface.co
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anis Ur Rahman (2025). mnist_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/mnist_augmented
    Explore at:
    Dataset updated
    Aug 4, 2025
    Authors
    Muhammad Anis Ur Rahman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for mnist_augmented

    This dataset contains augmented versions of the MNIST dataset, created to benchmark how various augmentation strategies impact digit classification accuracy using deep learning models. The dataset is provided as a .zip file and must be unzipped before use. It follows the ImageFolder structure compatible with PyTorch and other DL frameworks.

      📥 Download & Extract
    

    wget… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/mnist_augmented.

  6. h

    cifar10_augmented

    • huggingface.co
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anis Ur Rahman (2025). cifar10_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/cifar10_augmented
    Explore at:
    Dataset updated
    Jul 27, 2025
    Authors
    Muhammad Anis Ur Rahman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for cifar10_augmented

    This dataset contains augmented versions of CIFAR-10 to benchmark the impact of classical and synthetic augmentation techniques on deep image classification models. Provided as a .zip file, the dataset must be unzipped before use. It follows a standard ImageFolder structure for compatibility with PyTorch and TensorFlow pipelines.

      Dataset Details
    
    
    
    
    
      Dataset Sources
    

    Repository:… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/cifar10_augmented.

  7. m

    Aruzz22.5K: An Image Dataset of Rice Varieties

    • data.mendeley.com
    Updated Mar 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Masudul Islam (2024). Aruzz22.5K: An Image Dataset of Rice Varieties [Dataset]. http://doi.org/10.17632/3mn9843tz2.4
    Explore at:
    Dataset updated
    Mar 12, 2024
    Authors
    Md Masudul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This extensive dataset presents a meticulously curated collection of low-resolution images showcasing 20 well-established rice varieties native to diverse regions of Bangladesh. The rice samples were carefully gathered from both rural areas and local marketplaces, ensuring a comprehensive and varied representation. Serving as a visual compendium, the dataset provides a thorough exploration of the distinct characteristics of these rice varieties, facilitating precise classification.

    Dataset Composition

    The dataset encompasses 20 distinct classes, encompassing Subol Lota, Bashmoti (Deshi), Ganjiya, Shampakatari, Sugandhi Katarivog, BR-28, BR-29, Paijam, Bashful, Lal Aush, BR-Jirashail, Gutisharna, Birui, Najirshail, Pahari Birui, Polao (Katari), Polao (Chinigura), Amon, Shorna-5, and Lal Binni. In total, the dataset comprises 4,730 original JPG images and 23,650 augmented images.

    Image Capture and Dataset Organization

    These images were captured using an iPhone 11 camera with a 5x zoom feature. Each image capturing these rice varieties was diligently taken between October 18 and November 29, 2023. To facilitate efficient data management and organization, the dataset is structured into two variants: Original images and Augmented images. Each variant is systematically categorized into 20 distinct sub-directories, each corresponding to a specific rice variety.

    Original Image Dataset

    The primary image set comprises 4,730 JPG images, uniformly sized at 853 × 853 pixels. Due to the initial low resolution, the file size was notably 268 MB. Employing compression through a zip program significantly optimized the dataset, resulting in a final size of 254 MB.

    Augmented Image Dataset

    To address the substantial image volume requirements of deep learning models for machine vision, data augmentation techniques were implemented. Total 23,650 images was obtained from augmentation. These augmented images, also in JPG format and uniformly sized at 512 × 512 pixels, initially amounted to 781 MB. However, post-compression, the dataset was further streamlined to 699 MB.

    Dataset Storage and Access

    The raw and augmented datasets are stored in two distinct zip files, namely 'Original.zip' and 'Augmented.zip'. Both zip files contain 20 sub-folders representing a unique rice variety, namely 1_Subol_Lota, 2_Bashmoti, 3_Ganjiya, 4_Shampakatari, 5_Katarivog, 6_BR28, 7_BR29, 8_Paijam, 9_Bashful, 10_Lal_Aush, 11_Jirashail, 12_Gutisharna, 13_Red_Cargo,14_Najirshail, 15_Katari_Polao, 16_Lal_Biroi, 17_Chinigura_Polao, 18_Amon, 19_Shorna5, 20_Lal_Binni.

    Train and Test Data Organization

    To ease the experimenting process for the researchers we have balanced the data and split it in an 80:20 train-test ratio. The ‘Train_n_Test.zip’ folder contains two sub-directories: ‘1_TEST’ which contains 1125 images per class and ‘2_VALID’ which contains 225 images per class.

  8. Experimental results of the image classification task on the CIFAR-10...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Jiang; Kai Zhang; Nan Wang; Miao Yu (2023). Experimental results of the image classification task on the CIFAR-10 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0243613.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Wei Jiang; Kai Zhang; Nan Wang; Miao Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Experimental results of the image classification task on the CIFAR-10 dataset.

  9. h

    imagenet_augmented

    • huggingface.co
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anis Ur Rahman (2025). imagenet_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/imagenet_augmented
    Explore at:
    Dataset updated
    Aug 4, 2025
    Authors
    Muhammad Anis Ur Rahman
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for imagenet_augmented

    This dataset provides an augmented version of a subset of ImageNet, used to benchmark how classical and synthetic augmentations impact large-scale image classification models. All training data is organized by augmentation method, and the test/ set remains clean and unmodified. The dataset is compressed in .zip format and must be unzipped before use.

      📥 Download & Extract
    

    wget… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/imagenet_augmented.

  10. Additional file 3 of Which data subset should be augmented for deep...

    • figshare.com
    • springernature.figshare.com
    xlsx
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy (2023). Additional file 3 of Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images [Dataset]. http://doi.org/10.6084/m9.figshare.22622729.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 3. A Microsoft® Excel® workbook that details the raw data for the 20 experiments in which no test-set augmentation was done, including all of the image-classification output probabilities.

  11. Comprehensive Soil Classification Datasets

    • kaggle.com
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI4A Lab (2025). Comprehensive Soil Classification Datasets [Dataset]. https://www.kaggle.com/datasets/ai4a-lab/comprehensive-soil-classification-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AI4A Lab
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Soil Classification Datasets

    Please ensure to cite the paper when utilizing the dataset in a research study. Refer to the paper link or BibTeX provided below.

    This repository contains comprehensive datasets for soil classification and recognition research. The Original Dataset comprises soil images sourced from various online repositories, which have been meticulously cleaned and preprocessed to ensure data quality and consistency. To enhance the dataset's size and diversity, we employed Generative Adversarial Networks (GANs), specifically the CycleGAN architecture, to generate synthetic soil images. This augmented collection is referred to as the CyAUG Dataset. Both datasets are specifically designed to advance research in soil classification and recognition using state-of-the-art deep learning methodologies.

    This dataset was curated as part of the research study titled "An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations" by Farhan Sheth, Priya Mathur, Amit Kumar Gupta, and Sandeep Chaurasia, published in Engineering Applications of Artificial Intelligence.

    Links

    Application produced by this research is available at:

    Note: If you are using any part of this project; dataset, code, application, then please cite the work as mentioned in the Citation section below.

    Dataset

    Both dataset consists of images of 7 different soil types.

    The Soil Classification Dataset is structured to facilitate the classification of various soil types based on images. The dataset includes images of the following soil types:

    • Alluvial Soil
    • Black Soil
    • Laterite Soil
    • Red Soil
    • Yellow Soil
    • Arid Soil
    • Mountain Soil

    The dataset is organized into folders, each named after a specific soil type, containing images of that soil type. The images vary in resolution and quality, providing a diverse set of examples for training and testing classification models.

    Original Dataset Details

    • Total Images: 1189 images
    • Image Format: JPG/JPEG
    • Image Size: Varies
    • Source: Collected from various online repositories and cleaned for consistency.

    CyAUG Dataset Details

    • Total Images: 5097 images
    • Image Format: JPG/JPEG
    • Image Size: Varies
    • Source: Generated using CycleGAN to augment the original dataset, enhancing its size and diversity.

    Input and Output Parameters

    • Input Parameters:
      • Image: The images of the soils (JPG/JPEG format).
      • Label: The labels are in the format 'soil types' (folder names).
    • Output Parameter:
      • Classification: The predicted class (soil type) based on the input image.

    Citation

    If you are using any of the derived dataset, please cite the following paper:

    @article{SHETH2025111425,
      title = {An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations},
      journal = {Engineering Applications of Artificial Intelligence},
      volume = {158},
      pages = {111425},
      year = {2025},
      issn = {0952-1976},
      doi = {https://doi.org/10.1016/j.engappai.2025.111425},
      url = {https://www.sciencedirect.com/science/article/pii/S0952197625014277},
      author = {Farhan Sheth and Priya Mathur and Amit Kumar Gupta and Sandeep Chaurasia},
      keywords = {Soil classification, Crop recommendation, Vision transformers, Convolutional neural network, Transfer learning, Fuzzy logic}
    }
    
  12. m

    CLP-NC: Comprehensive Dataset for Machine Learning-Based Morphological...

    • data.mendeley.com
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israt Jahan (2025). CLP-NC: Comprehensive Dataset for Machine Learning-Based Morphological Analysis of Cleft Lip and Palate Variants Using Multimodal Medical Imaging [Dataset]. http://doi.org/10.17632/yxp6fxdymp.3
    Explore at:
    Dataset updated
    Mar 18, 2025
    Authors
    Israt Jahan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Cleft Lip and Palate vs. Non-Cleft (CLP-NC) Image Dataset is a high-resolution dataset designed for the automated detection and classification of cleft lip and palate anomalies. It comprises 3,987 images, categorized into two distinct classes: Cleft Lip and Palate (CLP) and Non-Cleft (NC). This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making. Dataset Characteristics: Total Images: 3,987 Number of Classes: 2 Image Format: JPG Image Resolution: 640 x 640 pixels Annotation: Each image is manually labeled and verified by medical experts Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications Categories and Annotations: The dataset includes images categorized into two classes: - Cleft Lip and Palate (CLP): Congenital anomaly where the upper lip and/or palate fails to develop properly. - Non-Cleft (NC): Normal craniofacial structures without cleft-related deformities. Dataset Structure and Splitting: The dataset is divided into two main parts: 1. Non-Augmented Part (Used for Classification): - Non-Augmented Imbalanced: Contains 168 images of Cleft Lip and Palate and 247 images of Non-Cleft. - Non-Augmented Balanced: Contains 500 images per class (Cleft Lip and Palate: 500, Non-Cleft: 500). 2. Augmented Part (Used for Object Detection): - Augmented Imbalanced: Includes 1,132 augmented images with an imbalanced distribution. - Augmented Balanced: Contains 1,440 images (Cleft Lip and Palate: 720, Non-Cleft: 720). The dataset is split into: - Training Set: 80% - Validation Set: 10% - Test Set: 10%

  13. Aptos and Messidor eye images

    • kaggle.com
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anik Bhowmick ae20b102 (2024). Aptos and Messidor eye images [Dataset]. https://www.kaggle.com/datasets/anikbhowmickae20b102/binary-classification-data-aptos-and-messidor
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2024
    Dataset provided by
    Kaggle
    Authors
    Anik Bhowmick ae20b102
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Early detection of Diabetic Retinopathy is a key challenge to prevent a patient from potential vision loss. The task of DR detection often requires special expertise from ophthalmologists. In remote places of the world such facilities may not be available, so In an attempt to automate the detection of DR, machine learning and deep learning techniques can be adopted. Some of the recent papers have proven such success on various publicly available dataset.

    Another challenge of deep learning techniques is the availability of rightly processed standardized data. Cleaning and preprocessing the data often takes much longer time than the model training. As a part of my research work, I had to preprocess the images taken from APTOS and Messidor before training the model. I applied circle-crop and Graham Ben's preprocessing technique and scaled all the images to 512X512 format. Also, I applied the data augmentation technique and increased the number of samples from 3662 data of APTOS to 18310, and 400 messidor samples to 3600 samples. I divided the images into two classes class 0 (NO DR) and class 1 (DR). The large number of data is essential for transfer learning. This process is very cumbersome and time-consuming. So I thought to upload the newly generated dataset in Kaggle so that some people might find it useful for their work. I hope this will help many people. Feel free to use the data.

  14. f

    Performance comparison of the OFIDA and several SOTA data augmentation...

    • plos.figshare.com
    xls
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng Zhang; Yina Guo; Haidong Wang; Hong Shangguan (2024). Performance comparison of the OFIDA and several SOTA data augmentation methods for image classification. [Dataset]. http://doi.org/10.1371/journal.pone.0302124.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Meng Zhang; Yina Guo; Haidong Wang; Hong Shangguan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison of the OFIDA and several SOTA data augmentation methods for image classification.

  15. f

    Augmentation performance using the ResNet architecture on the CIFAR-10...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). Augmentation performance using the ResNet architecture on the CIFAR-10 dataset for classification against image and feature maps. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Augmentation performance using the ResNet architecture on the CIFAR-10 dataset for classification against image and feature maps.

  16. m

    Dataset for pest classification in Mango farms from Indonesia

    • data.mendeley.com
    Updated Feb 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kusrini Kusrini (2020). Dataset for pest classification in Mango farms from Indonesia [Dataset]. http://doi.org/10.17632/94jf97jzc8.1
    Explore at:
    Dataset updated
    Feb 27, 2020
    Authors
    Kusrini Kusrini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Indonesia
    Description

    The infestation of pests affecting the Mango cultivation in Indonesia has an economic impact in the region. Following the recent development in the field of machine learning, the application of deep-learning models for multi-class pest-classification requires large collection of image samples upon which the algorithms can be trained. Addressing such a requirement the paper presents a detailed outline of the dataset collected from the Mango farms in Indonesia. The data consists of images captured from the Mango farms affected by 15-categories of pests which are identifiable through structural and visual deformity exhibited in the Mango leaves. The collection of the data involved the use of a low-cost sensing equipment that are commonly used by the farmers for collecting images from the farm. The collected data is subjected to two processes, namely the data augmentation process and training of the classification model. The dataset collection consists of 510 images that includes 15-caterogies of pests that affect Mango leaves along with the original appearance of the Mango leaves (resulting in 16-classes) collected over a period of 6 months. For the purposes of training the deep-learning neural network, the images are subjected to data augmentation to expand the dataset and to emulate closely the large-scale data collection process carried out by farmers. The outcome of the data augmentation process results in a total of 62,047 image samples, which are used to train the network. The multi-class classification framework. The training framework presented in the paper builds on the VGG-16 feature extractor and replaces the last 3-year network with a fully connected neural network layers resulting in 16-output classes. The dataset includes the annotation of the image samples for both original images captured from the field and the augmented image samples. Both the original and augmented data has been classified as training, validation and testing. The overall dataset is divided into 3-parts, namely version 0, version 1 and version 2. The version 0 consists of the original data set, with 310 images to be used for training, 103 images to be used for the validation and finally 97 images for testing. The version 1 of the dataset of includes 46,500 image samples for training, following the application of the data augmentation process followed by the 103 original images used for validation and 97 images for testing. Finally, the version 2 of the dataset uses 47, 500 images for training and 15, 450 images for validation and 97 images for the testing. The three versions of the dataset include images available in JPEG format. The visual appearance of the pests captured in the dataset provides an ideal testbed for benchmarking the performance of various deep-learning networks trained to detect specific categories of pests. In addition, the dataset also provides an opportunity to evaluate the impact of data augmentation techniques on the original dataset.

  17. R

    Brain Tumor Data Augmentation Dataset

    • universe.roboflow.com
    zip
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LotusWorks (2025). Brain Tumor Data Augmentation Dataset [Dataset]. https://universe.roboflow.com/lotusworks/brain-tumor-data-augmentation
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 20, 2025
    Dataset authored and provided by
    LotusWorks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects
    Description

    Brain Tumor Data Augmentation

    ## Overview
    
    Brain Tumor Data Augmentation is a dataset for classification tasks - it contains Objects annotations for 2,769 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  18. Monkeypox Skin Lesion Dataset

    • kaggle.com
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TensorKitty (2022). Monkeypox Skin Lesion Dataset [Dataset]. https://www.kaggle.com/datasets/nafin59/monkeypox-skin-lesion-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    TensorKitty
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    An updated version of the MSLD dataset, MSLD v2.0 has been released after being verified by an expert dermatologist!

    For details, check our GitHub repo!

    Context

    The recent monkeypox outbreak has become a global healthcare concern owing to its rapid spread in more than 65 countries around the globe. To obstruct its expeditious pace, early diagnosis is a must. But the confirmatory Polymerase Chain Reaction (PCR) tests and other biochemical assays are not readily available in sufficient quantities. In this scenario, computer-aided monkeypox identification from skin lesion images can be a beneficial measure. Nevertheless, so far, such datasets are not available. Hence, the "Monkeypox Skin Lesion Dataset (MSLD)" is created by collecting and processing images from different means of web-scrapping i.e., from news portals, websites and publicly accessible case reports.

    The creation of "Monkeypox Image Lesion Dataset" is primarily focused on distinguishing the monkeypox cases from the similar non-monkeypox cases. Therefore, along with the 'Monkeypox' class, we included skin lesion images of 'Chickenpox' and 'Measles' because of their resemblance to the monkeypox rash and pustules in initial state in another class named 'Others' to perform binary classification.

    Content

    There are 3 folders in the dataset.

    1) Original Images: It contains a total number of 228 images, among which 102 belongs to the 'Monkeypox' class and the remaining 126 represents the 'Others' class i.e., non-monkeypox (chickenpox and measles) cases.

    2) Augmented Images: To aid the classification task, several data augmentation methods such as rotation, translation, reflection, shear, hue, saturation, contrast and brightness jitter, noise, scaling etc. have been applied using MATLAB R2020a. Although this can be readily done using ImageGenerator/other image augmentors, to ensure reproducibility of the results, the augmented images are provided in this folder. Post-augmentation, the number of images increased by approximately 14-folds. The classes 'Monkeypox' and 'Others' have 1428 and 1764 images, respectively.

    3) Fold1: One of the three-fold cross validation datasets. To avoid any sort of bias in training, three-fold cross validation was performed. The original images were split into training, validation and test set(s) with the approximate proportion of 70 : 10 : 20 while maintaining patient independence. According to the commonly perceived data preparation practice, only the training and validation images were augmented while the test set contained only the original images. Users have the option of using the folds directly or using the original data and employing other algorithms to augment it.

    Additionally, a CSV file is provided that has 228 rows and two columns. The table contains the list of all the ImageID(s) with their corresponding label.

    Web Application

    Since monkeypox is demonstrating a very rapid community transmission pattern, a consumer-level software is truly necessary to increase awareness and encourage people to take rapid action. We have developed an easy-to-use web application named Monkey Pox Detector using the open-source python streamlit framework that uses our trained model to address this issue. It makes predictions on whether or not to see a specialist along with the prediction accuracy. Future updates will benefit from the user data we continue to collect and use to improve our model. The web app has a flask core, so that it can be deployed cross-platform in the future.

    Learn more at our GitHub repo!

    Citation

    If this dataset helped your research, please cite the following articles:

    Ali, S. N., Ahmed, M. T., Paul, J., Jahan, T., Sani, S. M. Sakeef, Noor, N., & Hasan, T. (2022). Monkeypox Skin Lesion Detection Using Deep Learning Models: A Preliminary Feasibility Study. arXiv preprint arXiv:2207.03342.

    @article{Nafisa2022, title={Monkeypox Skin Lesion Detection Using Deep Learning Models: A Preliminary Feasibility Study}, author={Ali, Shams Nafisa and Ahmed, Md. Tazuddin and Paul, Joydip and Jahan, Tasnim and Sani, S. M. Sakeef and Noor, Nawshaba and Hasan, Taufiq}, journal={arXiv preprint arXiv:2207.03342}, year={2022} }

    Ali, S. N., Ahmed, M. T., Jahan, T., Paul, J., Sani, S. M. Sakeef, Noor, N., Asma, A. N., & Hasan, T. (2023). A Web-based Mpox Skin Lesion Detection System Using State-of-the-art Deep Learning Models Considering Racial Diversity. arXiv preprint arXiv:2306.14169.

    @article{Nafisa2023, title={A Web-base...

  19. R

    Dataset Augmentation 1.1 Dataset

    • universe.roboflow.com
    zip
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datasetbuilding 4 (2025). Dataset Augmentation 1.1 Dataset [Dataset]. https://universe.roboflow.com/datasetbuilding-4/dataset-augmentation-1.1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 2, 2025
    Dataset authored and provided by
    datasetbuilding 4
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects
    Description

    Dataset Augmentation 1.1

    ## Overview
    
    Dataset Augmentation 1.1 is a dataset for classification tasks - it contains Objects annotations for 6,114 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  20. Comprehensive High-Resolution Eggplant Leaf Image Dataset for Plant Disease...

    • zenodo.org
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakib Hasan; Rakib Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan (2025). Comprehensive High-Resolution Eggplant Leaf Image Dataset for Plant Disease Detection [Dataset]. http://doi.org/10.5281/zenodo.15527092
    Explore at:
    Dataset updated
    May 28, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rakib Hasan; Rakib Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 12, 2025
    Description

    This is a comprehensive version of the Eggplant Leaf Image Dataset, designed to support machine learning and deep learning research in agriculture, plant pathology, and computer vision. This dataset addresses class imbalance and model generalization challenges by including a significantly expanded collection of images through controlled data augmentation.

    The dataset includes a total of 2,180 high-resolution images (6000×4000 pixels), categorized into six disease or health classes of Solanum melongena (eggplant) leaves:

    ClassOriginal ImagesAugmented ImagesTotal Images
    Healthy80320400
    Insect-Pest40320360
    Leaf-Spot50300350
    Mosaic-Virus15345360
    Small-Leaf20340360
    Wilt50300350

    All original images were captured using a Canon EOS 1300D DSLR camera under consistent natural lighting conditions. Files are saved in JPG format, and image resolution is preserved within ±5% of the original dimensions to maintain visual fidelity.

    To improve dataset usability for robust model training and generalization, controlled data augmentation was applied using the Albumentations library. The transformations include random rotation, horizontal flipping, brightness/contrast adjustments, slight color shifts, and padding to maintain aspect ratio. All augmentation procedures were consistently applied and seeded for reproducibility. Augmentation parameters are documented in detail in the metadata.

    The metadata.csv file provides a class-wise summary including original image count, augmented image count, augmentation ratios, and the exact augmentation pipeline used. The augmentation was seeded for reproducibility.

    Note: The original and augmented images are stored in separate folders under the "Original" and "Augmented" directories, respectively. Each directory is organized into six class-specific subfolders: Healthy, Insect-Pest, Leaf-Spot, Mosaic-Virus, Small-Leaf, and Wilt. Augmented images are clearly distinguishable by the inclusion of the substring "_aug_" in their filenames. This clear separation ensures reproducibility, transparency in data provenance, and ease of use for researchers who may wish to train models using only original, only augmented, or both types of data.

    Files:

    • EggplantLeaf-ImageDataset.zip — Contains all files and folders, inclusind Original, Augmented, metadata and readme.
    • OriginalC — Contains only raw field-captured images grouped by class.
    • Augmented — Contains synthetically expanded datasets, also organized by class. Augmented filenames include the marker "aug" for easy identification.
    • metadata.csv — Class-level summary and augmentation details.
    • Readme.md — Technical documentation and usage notes.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). CIFAR dataset classification experiment results according to the number of images used for data augmentation. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t005

CIFAR dataset classification experiment results according to the number of images used for data augmentation.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

CIFAR dataset classification experiment results according to the number of images used for data augmentation.

Search
Clear search
Close search
Google apps
Main menu