100+ datasets found

f
CIFAR dataset classification experiment results according to the number of...
plos.figshare.com
xls
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). CIFAR dataset classification experiment results according to the number of images used for data augmentation. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0274767.t005
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CIFAR dataset classification experiment results according to the number of images used for data augmentation.
MRI Dementia Augmentation : No Data leak
kaggle.com
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hema (2024). MRI Dementia Augmentation : No Data leak [Dataset]. https://www.kaggle.com/datasets/matthewhema/mri-dementia-augmentation-no-data-leak
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Matthew Hema
Description
The data consists of MRI images. The data has four classes of images. Separated the train test val = 70-15-15.

Mild Demented Moderate Demented Non Demented Very Mild Demented

Original Dataset accessed from https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset.
d
Data from: How many specimens make a sufficient training set for automated...
search.dataone.org
zenodo.org
+1more
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James M. Mulqueeney; Alex Searle-Barnes; Anieke Brombacher; Marisa Sweeney; Anjali Goswami; Thomas H. G. Ezard (2025). How many specimens make a sufficient training set for automated three dimensional feature extraction? [Dataset]. http://doi.org/10.5061/dryad.1rn8pk12f
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.1rn8pk12f
Dataset updated
Aug 1, 2025
Dataset provided by
Dryad Digital Repository
Authors
James M. Mulqueeney; Alex Searle-Barnes; Anieke Brombacher; Marisa Sweeney; Anjali Goswami; Thomas H. G. Ezard
Description
Deep learning has emerged as a robust tool for automating feature extraction from 3D images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artificial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and ..., Data collection 50 planktonic foraminifera, comprising 4 Menardella menardii, 17 Menardella limbata, 18 Menardella exilis, and 11 Menardella pertenuis specimens, were used in our analyses (electronic supplementary material, figures S1 and S2). The taxonomic classification of these species was established based on the analysis of morphological characteristics observed in their shells. In this context, all species are characterised by lenticular, low trochosprial tests with a prominent keel [13]. Discrimination among these species is achievable, as M. limbata can be distinguished from its ancestor, M. menardii, by having a greater number of chambers and a smaller umbilicus. Moreover, M. exilis and M. pertenuis can be discerned from M. limbata by their thinner, more polished tests and reduced trochospirality. Furthermore, M. pertenuis is identifiable by a thin plate extending over the umbilicus and possessing a greater number of chambers in the final whorl compared to M. exilis [13]. The s..., , # Data from: How many specimens make a sufficient training set for automated three dimensional feature extraction?

https://doi.org/10.5061/dryad.1rn8pk12f

All computer code and final raw data used for this research work are stored in GitHub: https://github.com/JamesMulqueeney/Automated-3D-Feature-Extraction and have been archived within the Zenodo repository:Â https://doi.org/10.5281/zenodo.11109348.Â

This data is the additional primary data used in each analysis. These include: CT Image Files, Manual Segmentation Files (use for training or analysis), Inputs and Outputs for Shape Analysis and an example .h5 file which can be used to practice AI segmentation.Â

Description of the data and file structure

The primary data is arranged into the following:

Image_Files.zip: Foraminiferal CT data used in the analysis.Â

**I...
n
Data from: New Deep Learning Methods for Medical Image Analysis and...
curate.nd.edu
pdf
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pengfei Gu (2024). New Deep Learning Methods for Medical Image Analysis and Scientific Data Generation and Compression [Dataset]. http://doi.org/10.7274/26156719.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/26156719.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Pengfei Gu
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Medical image analysis is critical to biological studies, health research, computer- aided diagnoses, and clinical applications. Recently, deep learning (DL) techniques have achieved remarkable successes in medical image analysis applications. However, these techniques typically require large amounts of annotations to achieve satisfactory performance. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for medical image analysis while reducing annotation efforts? To address this problem, we have outlined two specific aims: (A1) Utilize existing annotations effectively from advanced models; (A2) extract generic knowledge directly from unannotated images.

To achieve the aim (A1): First, we introduce a new data representation called TopoImages, which encodes the local topology of all the image pixels. TopoImages can be complemented with the original images to improve medical image analysis tasks. Second, we propose a new augmentation method, SAMAug-C, that lever- ages the Segment Anything Model (SAM) to augment raw image input and enhance medical image classification. Third, we propose two advanced DL architectures, kCBAC-Net and ConvFormer, to enhance the performance of 2D and 3D medical image segmentation. We also present a gate-regularized network training (GrNT) approach to improve multi-scale fusion in medical image segmentation. To achieve the aim (A2), we propose a novel extension of known Masked Autoencoders (MAEs) for self pre-training, i.e., models pre-trained on the same target dataset, specifically for 3D medical image segmentation.

Scientific visualization is a powerful approach for understanding and analyzing various physical or natural phenomena, such as climate change or chemical reactions. However, the cost of scientific simulations is high when factors like time, ensemble, and multivariate analyses are involved. Additionally, scientists can only afford to sparsely store the simulation outputs (e.g., scalar field data) or visual representations (e.g., streamlines) or visualization images due to limited I/O bandwidths and storage space. Therefore, in this dissertation, we seek to address this critical problem: How can we develop efficient and effective DL algorithms for scientific data generation and compression while reducing simulation and storage costs?

To tackle this problem: First, we propose a DL framework that generates un- steady vector fields data from a set of streamlines. Based on this method, domain scientists only need to store representative streamlines at simulation time and recon- struct vector fields during post-processing. Second, we design a novel DL method that translates scalar fields to vector fields. Using this approach, domain scientists only need to store scalar field data at simulation time and generate vector fields from their scalar field counterparts afterward. Third, we present a new DL approach that compresses a large collection of visualization images generated from time-varying data for communicating volume visualization results.
h
mnist_augmented
huggingface.co
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Anis Ur Rahman (2025). mnist_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/mnist_augmented
Explore at:
Dataset updated
Aug 4, 2025
Authors
Muhammad Anis Ur Rahman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for mnist_augmented

This dataset contains augmented versions of the MNIST dataset, created to benchmark how various augmentation strategies impact digit classification accuracy using deep learning models. The dataset is provided as a .zip file and must be unzipped before use. It follows the ImageFolder structure compatible with PyTorch and other DL frameworks.

📥 Download & Extract

wget… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/mnist_augmented.
h
cifar10_augmented
huggingface.co
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Anis Ur Rahman (2025). cifar10_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/cifar10_augmented
Explore at:
Dataset updated
Jul 27, 2025
Authors
Muhammad Anis Ur Rahman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for cifar10_augmented

This dataset contains augmented versions of CIFAR-10 to benchmark the impact of classical and synthetic augmentation techniques on deep image classification models. Provided as a .zip file, the dataset must be unzipped before use. It follows a standard ImageFolder structure for compatibility with PyTorch and TensorFlow pipelines.

Dataset Details Dataset Sources

Repository:… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/cifar10_augmented.
m
Aruzz22.5K: An Image Dataset of Rice Varieties
data.mendeley.com
Updated Mar 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Masudul Islam (2024). Aruzz22.5K: An Image Dataset of Rice Varieties [Dataset]. http://doi.org/10.17632/3mn9843tz2.4
Explore at:
Unique identifier
https://doi.org/10.17632/3mn9843tz2.4
Dataset updated
Mar 12, 2024
Authors
Md Masudul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This extensive dataset presents a meticulously curated collection of low-resolution images showcasing 20 well-established rice varieties native to diverse regions of Bangladesh. The rice samples were carefully gathered from both rural areas and local marketplaces, ensuring a comprehensive and varied representation. Serving as a visual compendium, the dataset provides a thorough exploration of the distinct characteristics of these rice varieties, facilitating precise classification.

Dataset Composition

The dataset encompasses 20 distinct classes, encompassing Subol Lota, Bashmoti (Deshi), Ganjiya, Shampakatari, Sugandhi Katarivog, BR-28, BR-29, Paijam, Bashful, Lal Aush, BR-Jirashail, Gutisharna, Birui, Najirshail, Pahari Birui, Polao (Katari), Polao (Chinigura), Amon, Shorna-5, and Lal Binni. In total, the dataset comprises 4,730 original JPG images and 23,650 augmented images.

Image Capture and Dataset Organization

These images were captured using an iPhone 11 camera with a 5x zoom feature. Each image capturing these rice varieties was diligently taken between October 18 and November 29, 2023. To facilitate efficient data management and organization, the dataset is structured into two variants: Original images and Augmented images. Each variant is systematically categorized into 20 distinct sub-directories, each corresponding to a specific rice variety.

Original Image Dataset

The primary image set comprises 4,730 JPG images, uniformly sized at 853 × 853 pixels. Due to the initial low resolution, the file size was notably 268 MB. Employing compression through a zip program significantly optimized the dataset, resulting in a final size of 254 MB.

Augmented Image Dataset

To address the substantial image volume requirements of deep learning models for machine vision, data augmentation techniques were implemented. Total 23,650 images was obtained from augmentation. These augmented images, also in JPG format and uniformly sized at 512 × 512 pixels, initially amounted to 781 MB. However, post-compression, the dataset was further streamlined to 699 MB.

Dataset Storage and Access

The raw and augmented datasets are stored in two distinct zip files, namely 'Original.zip' and 'Augmented.zip'. Both zip files contain 20 sub-folders representing a unique rice variety, namely 1_Subol_Lota, 2_Bashmoti, 3_Ganjiya, 4_Shampakatari, 5_Katarivog, 6_BR28, 7_BR29, 8_Paijam, 9_Bashful, 10_Lal_Aush, 11_Jirashail, 12_Gutisharna, 13_Red_Cargo,14_Najirshail, 15_Katari_Polao, 16_Lal_Biroi, 17_Chinigura_Polao, 18_Amon, 19_Shorna5, 20_Lal_Binni.

Train and Test Data Organization

To ease the experimenting process for the researchers we have balanced the data and split it in an 80:20 train-test ratio. The ‘Train_n_Test.zip’ folder contains two sub-directories: ‘1_TEST’ which contains 1125 images per class and ‘2_VALID’ which contains 225 images per class.
Experimental results of the image classification task on the CIFAR-10...
plos.figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Jiang; Kai Zhang; Nan Wang; Miao Yu (2023). Experimental results of the image classification task on the CIFAR-10 dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0243613.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0243613.t002
Dataset updated
Jun 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Wei Jiang; Kai Zhang; Nan Wang; Miao Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Experimental results of the image classification task on the CIFAR-10 dataset.
h
imagenet_augmented
huggingface.co
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Anis Ur Rahman (2025). imagenet_augmented [Dataset]. https://huggingface.co/datasets/ianisdev/imagenet_augmented
Explore at:
Dataset updated
Aug 4, 2025
Authors
Muhammad Anis Ur Rahman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for imagenet_augmented

This dataset provides an augmented version of a subset of ImageNet, used to benchmark how classical and synthetic augmentations impact large-scale image classification models. All training data is organized by augmentation method, and the test/ set remains clean and unmodified. The dataset is compressed in .zip format and must be unzipped before use.

📥 Download & Extract

wget… See the full description on the dataset page: https://huggingface.co/datasets/ianisdev/imagenet_augmented.
Additional file 3 of Which data subset should be augmented for deep...
figshare.com
springernature.figshare.com
xlsx
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy (2023). Additional file 3 of Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images [Dataset]. http://doi.org/10.6084/m9.figshare.22622729.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22622729.v1
Dataset updated
Jun 21, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yusra A. Ameen; Dalia M. Badary; Ahmad Elbadry I. Abonnoor; Khaled F. Hussain; Adel A. Sewisy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 3. A Microsoft® Excel® workbook that details the raw data for the 20 experiments in which no test-set augmentation was done, including all of the image-classification output probabilities.
Comprehensive Soil Classification Datasets
kaggle.com
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI4A Lab (2025). Comprehensive Soil Classification Datasets [Dataset]. https://www.kaggle.com/datasets/ai4a-lab/comprehensive-soil-classification-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AI4A Lab
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Soil Classification Datasets

Please ensure to cite the paper when utilizing the dataset in a research study. Refer to the paper link or BibTeX provided below.

This repository contains comprehensive datasets for soil classification and recognition research. The Original Dataset comprises soil images sourced from various online repositories, which have been meticulously cleaned and preprocessed to ensure data quality and consistency. To enhance the dataset's size and diversity, we employed Generative Adversarial Networks (GANs), specifically the CycleGAN architecture, to generate synthetic soil images. This augmented collection is referred to as the CyAUG Dataset. Both datasets are specifically designed to advance research in soil classification and recognition using state-of-the-art deep learning methodologies.

This dataset was curated as part of the research study titled "An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations" by Farhan Sheth, Priya Mathur, Amit Kumar Gupta, and Sandeep Chaurasia, published in Engineering Applications of Artificial Intelligence.

Links

Paper: An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations

Code: Development Code (model and recommendation system)

Application produced by this research is available at:

Agro Companion: Agro Companion

Server Code: Agro Companion Server

Note: If you are using any part of this project; dataset, code, application, then please cite the work as mentioned in the Citation section below.

Dataset

Both dataset consists of images of 7 different soil types.

The Soil Classification Dataset is structured to facilitate the classification of various soil types based on images. The dataset includes images of the following soil types:

Alluvial Soil

Black Soil

Laterite Soil

Red Soil

Yellow Soil

Arid Soil

Mountain Soil

The dataset is organized into folders, each named after a specific soil type, containing images of that soil type. The images vary in resolution and quality, providing a diverse set of examples for training and testing classification models.

Original Dataset Details

Total Images: 1189 images

Image Format: JPG/JPEG

Image Size: Varies

Source: Collected from various online repositories and cleaned for consistency.

CyAUG Dataset Details

Total Images: 5097 images

Image Format: JPG/JPEG

Image Size: Varies

Source: Generated using CycleGAN to augment the original dataset, enhancing its size and diversity.

Input and Output Parameters

Input Parameters:

Image: The images of the soils (JPG/JPEG format).

Label: The labels are in the format 'soil types' (folder names).

Output Parameter:

Classification: The predicted class (soil type) based on the input image.

Citation

If you are using any of the derived dataset, please cite the following paper:

@article{SHETH2025111425, title = {An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations}, journal = {Engineering Applications of Artificial Intelligence}, volume = {158}, pages = {111425}, year = {2025}, issn = {0952-1976}, doi = {https://doi.org/10.1016/j.engappai.2025.111425}, url = {https://www.sciencedirect.com/science/article/pii/S0952197625014277}, author = {Farhan Sheth and Priya Mathur and Amit Kumar Gupta and Sandeep Chaurasia}, keywords = {Soil classification, Crop recommendation, Vision transformers, Convolutional neural network, Transfer learning, Fuzzy logic} }
m
CLP-NC: Comprehensive Dataset for Machine Learning-Based Morphological...
data.mendeley.com
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Israt Jahan (2025). CLP-NC: Comprehensive Dataset for Machine Learning-Based Morphological Analysis of Cleft Lip and Palate Variants Using Multimodal Medical Imaging [Dataset]. http://doi.org/10.17632/yxp6fxdymp.3
Explore at:
Unique identifier
https://doi.org/10.17632/yxp6fxdymp.3
Dataset updated
Mar 18, 2025
Authors
Israt Jahan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Cleft Lip and Palate vs. Non-Cleft (CLP-NC) Image Dataset is a high-resolution dataset designed for the automated detection and classification of cleft lip and palate anomalies. It comprises 3,987 images, categorized into two distinct classes: Cleft Lip and Palate (CLP) and Non-Cleft (NC). This dataset serves as a valuable resource for researchers in medical image analysis, deep learning, and clinical decision-making. Dataset Characteristics: Total Images: 3,987 Number of Classes: 2 Image Format: JPG Image Resolution: 640 x 640 pixels Annotation: Each image is manually labeled and verified by medical experts Data Preprocessing: Auto-orientation and histogram equalization applied for enhanced feature detection Augmentation Techniques: Rotation, scaling, brightness adjustments, flipping, and contrast modifications Categories and Annotations: The dataset includes images categorized into two classes: - Cleft Lip and Palate (CLP): Congenital anomaly where the upper lip and/or palate fails to develop properly. - Non-Cleft (NC): Normal craniofacial structures without cleft-related deformities. Dataset Structure and Splitting: The dataset is divided into two main parts: 1. Non-Augmented Part (Used for Classification): - Non-Augmented Imbalanced: Contains 168 images of Cleft Lip and Palate and 247 images of Non-Cleft. - Non-Augmented Balanced: Contains 500 images per class (Cleft Lip and Palate: 500, Non-Cleft: 500). 2. Augmented Part (Used for Object Detection): - Augmented Imbalanced: Includes 1,132 augmented images with an imbalanced distribution. - Augmented Balanced: Contains 1,440 images (Cleft Lip and Palate: 720, Non-Cleft: 720). The dataset is split into: - Training Set: 80% - Validation Set: 10% - Test Set: 10%
Aptos and Messidor eye images
kaggle.com
Updated Jun 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anik Bhowmick ae20b102 (2024). Aptos and Messidor eye images [Dataset]. https://www.kaggle.com/datasets/anikbhowmickae20b102/binary-classification-data-aptos-and-messidor
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2024
Dataset provided by
Kaggle
Authors
Anik Bhowmick ae20b102
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Early detection of Diabetic Retinopathy is a key challenge to prevent a patient from potential vision loss. The task of DR detection often requires special expertise from ophthalmologists. In remote places of the world such facilities may not be available, so In an attempt to automate the detection of DR, machine learning and deep learning techniques can be adopted. Some of the recent papers have proven such success on various publicly available dataset.

Another challenge of deep learning techniques is the availability of rightly processed standardized data. Cleaning and preprocessing the data often takes much longer time than the model training. As a part of my research work, I had to preprocess the images taken from APTOS and Messidor before training the model. I applied circle-crop and Graham Ben's preprocessing technique and scaled all the images to 512X512 format. Also, I applied the data augmentation technique and increased the number of samples from 3662 data of APTOS to 18310, and 400 messidor samples to 3600 samples. I divided the images into two classes class 0 (NO DR) and class 1 (DR). The large number of data is essential for transfer learning. This process is very cumbersome and time-consuming. So I thought to upload the newly generated dataset in Kaggle so that some people might find it useful for their work. I hope this will help many people. Feel free to use the data.
f
Performance comparison of the OFIDA and several SOTA data augmentation...
plos.figshare.com
xls
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meng Zhang; Yina Guo; Haidong Wang; Hong Shangguan (2024). Performance comparison of the OFIDA and several SOTA data augmentation methods for image classification. [Dataset]. http://doi.org/10.1371/journal.pone.0302124.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302124.t002
Dataset updated
May 2, 2024
Dataset provided by
PLOS ONE
Authors
Meng Zhang; Yina Guo; Haidong Wang; Hong Shangguan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance comparison of the OFIDA and several SOTA data augmentation methods for image classification.
f
Augmentation performance using the ResNet architecture on the CIFAR-10...
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). Augmentation performance using the ResNet architecture on the CIFAR-10 dataset for classification against image and feature maps. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0274767.t006
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Augmentation performance using the ResNet architecture on the CIFAR-10 dataset for classification against image and feature maps.
m
Dataset for pest classification in Mango farms from Indonesia
data.mendeley.com
Updated Feb 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kusrini Kusrini (2020). Dataset for pest classification in Mango farms from Indonesia [Dataset]. http://doi.org/10.17632/94jf97jzc8.1
Explore at:
Unique identifier
https://doi.org/10.17632/94jf97jzc8.1
Dataset updated
Feb 27, 2020
Authors
Kusrini Kusrini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Indonesia
Description
The infestation of pests affecting the Mango cultivation in Indonesia has an economic impact in the region. Following the recent development in the field of machine learning, the application of deep-learning models for multi-class pest-classification requires large collection of image samples upon which the algorithms can be trained. Addressing such a requirement the paper presents a detailed outline of the dataset collected from the Mango farms in Indonesia. The data consists of images captured from the Mango farms affected by 15-categories of pests which are identifiable through structural and visual deformity exhibited in the Mango leaves. The collection of the data involved the use of a low-cost sensing equipment that are commonly used by the farmers for collecting images from the farm. The collected data is subjected to two processes, namely the data augmentation process and training of the classification model. The dataset collection consists of 510 images that includes 15-caterogies of pests that affect Mango leaves along with the original appearance of the Mango leaves (resulting in 16-classes) collected over a period of 6 months. For the purposes of training the deep-learning neural network, the images are subjected to data augmentation to expand the dataset and to emulate closely the large-scale data collection process carried out by farmers. The outcome of the data augmentation process results in a total of 62,047 image samples, which are used to train the network. The multi-class classification framework. The training framework presented in the paper builds on the VGG-16 feature extractor and replaces the last 3-year network with a fully connected neural network layers resulting in 16-output classes. The dataset includes the annotation of the image samples for both original images captured from the field and the augmented image samples. Both the original and augmented data has been classified as training, validation and testing. The overall dataset is divided into 3-parts, namely version 0, version 1 and version 2. The version 0 consists of the original data set, with 310 images to be used for training, 103 images to be used for the validation and finally 97 images for testing. The version 1 of the dataset of includes 46,500 image samples for training, following the application of the data augmentation process followed by the 103 original images used for validation and 97 images for testing. Finally, the version 2 of the dataset uses 47, 500 images for training and 15, 450 images for validation and 97 images for the testing. The three versions of the dataset include images available in JPEG format. The visual appearance of the pests captured in the dataset provides an ideal testbed for benchmarking the performance of various deep-learning networks trained to detect specific categories of pests. In addition, the dataset also provides an opportunity to evaluate the impact of data augmentation techniques on the original dataset.
R
Brain Tumor Data Augmentation Dataset
universe.roboflow.com
zip
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LotusWorks (2025). Brain Tumor Data Augmentation Dataset [Dataset]. https://universe.roboflow.com/lotusworks/brain-tumor-data-augmentation
Explore at:
zipAvailable download formats
Dataset updated
Apr 20, 2025
Dataset authored and provided by
LotusWorks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects
Description
Brain Tumor Data Augmentation

## Overview Brain Tumor Data Augmentation is a dataset for classification tasks - it contains Objects annotations for 2,769 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Monkeypox Skin Lesion Dataset
kaggle.com
Updated Jul 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TensorKitty (2022). Monkeypox Skin Lesion Dataset [Dataset]. https://www.kaggle.com/datasets/nafin59/monkeypox-skin-lesion-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 5, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
TensorKitty
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
An updated version of the MSLD dataset, MSLD v2.0 has been released after being verified by an expert dermatologist!

For details, check our GitHub repo!

Context

The recent monkeypox outbreak has become a global healthcare concern owing to its rapid spread in more than 65 countries around the globe. To obstruct its expeditious pace, early diagnosis is a must. But the confirmatory Polymerase Chain Reaction (PCR) tests and other biochemical assays are not readily available in sufficient quantities. In this scenario, computer-aided monkeypox identification from skin lesion images can be a beneficial measure. Nevertheless, so far, such datasets are not available. Hence, the "Monkeypox Skin Lesion Dataset (MSLD)" is created by collecting and processing images from different means of web-scrapping i.e., from news portals, websites and publicly accessible case reports.

The creation of "Monkeypox Image Lesion Dataset" is primarily focused on distinguishing the monkeypox cases from the similar non-monkeypox cases. Therefore, along with the 'Monkeypox' class, we included skin lesion images of 'Chickenpox' and 'Measles' because of their resemblance to the monkeypox rash and pustules in initial state in another class named 'Others' to perform binary classification.

Content

There are 3 folders in the dataset.

1) Original Images: It contains a total number of 228 images, among which 102 belongs to the 'Monkeypox' class and the remaining 126 represents the 'Others' class i.e., non-monkeypox (chickenpox and measles) cases.

2) Augmented Images: To aid the classification task, several data augmentation methods such as rotation, translation, reflection, shear, hue, saturation, contrast and brightness jitter, noise, scaling etc. have been applied using MATLAB R2020a. Although this can be readily done using ImageGenerator/other image augmentors, to ensure reproducibility of the results, the augmented images are provided in this folder. Post-augmentation, the number of images increased by approximately 14-folds. The classes 'Monkeypox' and 'Others' have 1428 and 1764 images, respectively.

3) Fold1: One of the three-fold cross validation datasets. To avoid any sort of bias in training, three-fold cross validation was performed. The original images were split into training, validation and test set(s) with the approximate proportion of 70 : 10 : 20 while maintaining patient independence. According to the commonly perceived data preparation practice, only the training and validation images were augmented while the test set contained only the original images. Users have the option of using the folds directly or using the original data and employing other algorithms to augment it.

Additionally, a CSV file is provided that has 228 rows and two columns. The table contains the list of all the ImageID(s) with their corresponding label.

Web Application

Since monkeypox is demonstrating a very rapid community transmission pattern, a consumer-level software is truly necessary to increase awareness and encourage people to take rapid action. We have developed an easy-to-use web application named Monkey Pox Detector using the open-source python streamlit framework that uses our trained model to address this issue. It makes predictions on whether or not to see a specialist along with the prediction accuracy. Future updates will benefit from the user data we continue to collect and use to improve our model. The web app has a flask core, so that it can be deployed cross-platform in the future.

Learn more at our GitHub repo!

Citation

If this dataset helped your research, please cite the following articles:

Ali, S. N., Ahmed, M. T., Paul, J., Jahan, T., Sani, S. M. Sakeef, Noor, N., & Hasan, T. (2022). Monkeypox Skin Lesion Detection Using Deep Learning Models: A Preliminary Feasibility Study. arXiv preprint arXiv:2207.03342.

@article{Nafisa2022, title={Monkeypox Skin Lesion Detection Using Deep Learning Models: A Preliminary Feasibility Study}, author={Ali, Shams Nafisa and Ahmed, Md. Tazuddin and Paul, Joydip and Jahan, Tasnim and Sani, S. M. Sakeef and Noor, Nawshaba and Hasan, Taufiq}, journal={arXiv preprint arXiv:2207.03342}, year={2022} }

Ali, S. N., Ahmed, M. T., Jahan, T., Paul, J., Sani, S. M. Sakeef, Noor, N., Asma, A. N., & Hasan, T. (2023). A Web-based Mpox Skin Lesion Detection System Using State-of-the-art Deep Learning Models Considering Racial Diversity. arXiv preprint arXiv:2306.14169.

@article{Nafisa2023, title={A Web-base...
R
Dataset Augmentation 1.1 Dataset
universe.roboflow.com
zip
Updated Mar 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datasetbuilding 4 (2025). Dataset Augmentation 1.1 Dataset [Dataset]. https://universe.roboflow.com/datasetbuilding-4/dataset-augmentation-1.1
Explore at:
zipAvailable download formats
Dataset updated
Mar 2, 2025
Dataset authored and provided by
datasetbuilding 4
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects
Description
Dataset Augmentation 1.1

## Overview Dataset Augmentation 1.1 is a dataset for classification tasks - it contains Objects annotations for 6,114 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Comprehensive High-Resolution Eggplant Leaf Image Dataset for Plant Disease...
zenodo.org
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakib Hasan; Rakib Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan (2025). Comprehensive High-Resolution Eggplant Leaf Image Dataset for Plant Disease Detection [Dataset]. http://doi.org/10.5281/zenodo.15527092
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15527092
Dataset updated
May 28, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rakib Hasan; Rakib Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan; Sanimun Hossain Sanzit; Md. Mamun Hosen; Muksitul Islam; Ferdous Hasan; Mehedi Hasan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 12, 2025
Description
This is a comprehensive version of the Eggplant Leaf Image Dataset, designed to support machine learning and deep learning research in agriculture, plant pathology, and computer vision. This dataset addresses class imbalance and model generalization challenges by including a significantly expanded collection of images through controlled data augmentation.

The dataset includes a total of 2,180 high-resolution images (6000×4000 pixels), categorized into six disease or health classes of Solanum melongena (eggplant) leaves:

Class Original Images Augmented Images Total Images
Healthy 80 320 400
Insect-Pest 40 320 360
Leaf-Spot 50 300 350
Mosaic-Virus 15 345 360
Small-Leaf 20 340 360
Wilt 50 300 350

All original images were captured using a Canon EOS 1300D DSLR camera under consistent natural lighting conditions. Files are saved in JPG format, and image resolution is preserved within ±5% of the original dimensions to maintain visual fidelity.

To improve dataset usability for robust model training and generalization, controlled data augmentation was applied using the Albumentations library. The transformations include random rotation, horizontal flipping, brightness/contrast adjustments, slight color shifts, and padding to maintain aspect ratio. All augmentation procedures were consistently applied and seeded for reproducibility. Augmentation parameters are documented in detail in the metadata.

The metadata.csv file provides a class-wise summary including original image count, augmented image count, augmentation ratios, and the exact augmentation pipeline used. The augmentation was seeded for reproducibility.

Note: The original and augmented images are stored in separate folders under the "Original" and "Augmented" directories, respectively. Each directory is organized into six class-specific subfolders: Healthy, Insect-Pest, Leaf-Spot, Mosaic-Virus, Small-Leaf, and Wilt. Augmented images are clearly distinguishable by the inclusion of the substring "_aug_" in their filenames. This clear separation ensures reproducibility, transparency in data provenance, and ease of use for researchers who may wish to train models using only original, only augmented, or both types of data.

Files:

EggplantLeaf-ImageDataset.zip — Contains all files and folders, inclusind Original, Augmented, metadata and readme.

OriginalC — Contains only raw field-captured images grouped by class.

Augmented — Contains synthetically expanded datasets, also organized by class. Augmented filenames include the marker "aug" for easy identification.

metadata.csv — Class-level summary and augmentation details.

Readme.md — Technical documentation and usage notes.

Class	Original Images	Augmented Images	Total Images
Healthy	80	320	400
Insect-Pest	40	320	360
Leaf-Spot	50	300	350
Mosaic-Virus	15	345	360
Small-Leaf	20	340	360
Wilt	50	300	350

Facebook

Twitter

Click to copy link

Link copied

Cite

Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). CIFAR dataset classification experiment results according to the number of images used for data augmentation. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t005

CIFAR dataset classification experiment results according to the number of images used for data augmentation.

Explore at:

xlsAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0274767.t005

Dataset updated

Jun 13, 2023

Dataset provided by

PLOS ONE

Authors

Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

CIFAR dataset classification experiment results according to the number of images used for data augmentation.

Clear search

Close search

Google apps

Main menu

CIFAR dataset classification experiment results according to the number of...

MRI Dementia Augmentation : No Data leak

Data from: How many specimens make a sufficient training set for automated...

Description of the data and file structure

Data from: New Deep Learning Methods for Medical Image Analysis and...

mnist_augmented

cifar10_augmented

Aruzz22.5K: An Image Dataset of Rice Varieties

Dataset Composition

Image Capture and Dataset Organization

Original Image Dataset

Augmented Image Dataset

Dataset Storage and Access

Train and Test Data Organization

Experimental results of the image classification task on the CIFAR-10...

imagenet_augmented

Additional file 3 of Which data subset should be augmented for deep...

Comprehensive Soil Classification Datasets

Soil Classification Datasets

Links

Dataset

Original Dataset Details

CyAUG Dataset Details

Input and Output Parameters

Citation

CLP-NC: Comprehensive Dataset for Machine Learning-Based Morphological...

Aptos and Messidor eye images

Performance comparison of the OFIDA and several SOTA data augmentation...

Augmentation performance using the ResNet architecture on the CIFAR-10...

Dataset for pest classification in Mango farms from Indonesia

Brain Tumor Data Augmentation Dataset

Brain Tumor Data Augmentation

Monkeypox Skin Lesion Dataset

An updated version of the MSLD dataset, MSLD v2.0 has been released after being verified by an expert dermatologist!

Context

Content

Web Application

Citation

Dataset Augmentation 1.1 Dataset

Dataset Augmentation 1.1

Comprehensive High-Resolution Eggplant Leaf Image Dataset for Plant Disease...

CIFAR dataset classification experiment results according to the number of images used for data augmentation.