13 datasets found

i
Data from: Regularization for Unconditional Image Diffusion Models via...
ieee-dataport.org
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kensuke NAKAMURA (2025). Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation [Dataset]. http://ieee-dataport.org/documents/regularization-unconditional-image-diffusion-models-shifted-data-augmentation
Explore at:
Dataset updated
Jun 22, 2025
Authors
Kensuke NAKAMURA
Description
it often causes leakage
i
Respiration and Exhaled Hydration Dataset Based on Data Augmentation
ieee-dataport.org
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sagheer khan (2025). Respiration and Exhaled Hydration Dataset Based on Data Augmentation [Dataset]. https://ieee-dataport.org/open-access/respiration-and-exhaled-hydration-dataset-based-data-augmentation
Explore at:
Dataset updated
Jan 7, 2025
Authors
sagheer khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
monitors respiratory issues
Z
ProxyFAUG: Proximity-based Fingerprint Augmentation (data)
data.niaid.nih.gov
Updated Jun 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kalousis, Alexandros (2022). ProxyFAUG: Proximity-based Fingerprint Augmentation (data) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4457390
Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Kalousis, Alexandros
Anagnostopoulos, Grigorios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The supplementary data of the paper "ProxyFAUG: Proximity-based Fingerprint Augmentation".

Open access Author’s accepted manuscript version: https://arxiv.org/abs/2102.02706v2

Published paper: https://ieeexplore.ieee.org/document/9662590

The train/validation/test sets used in the paper "ProxyFAUG: Proximity-based Fingerprint Augmentation", after having passed the preprocessing process described in the paper, are made available here. Moreover, the augmentations produced by the proposed ProxyFAUG method are also made available with the files (x_aug_train.csv, y_aug_train.csv). More specifically:

x_train_pre.csv : The features side (x) information of the preprocessed training set.

x_val_pre.csv : The features side (x) information of the preprocessed validation set.

x_test_pre.csv : The features side (x) information of the preprocessed test set.

x_aug_train.csv : The features side (x) information of the fingerprints generated by ProxyFAUG.

y_train.csv : The location ground truth information (y) of the training set.

y_val.csv : The location ground truth information (y) of the validation set.

y_test.csv : The location ground truth information (y) of the test set.

y_aug_train.csv : The location ground truth information (y) of the fingerprints generated by ProxyFAUG.

Note that in the paper, the original training set (x_train_pre.csv) is used as a baseline, and is compared against the scenario where the concatenation of the original and the generated training sets (concatenation of x_train_pre.csv and x_aug_train.csv) is used.

The full code implementation related to the paper is available here:

Code: https://zenodo.org/record/4457353

The original full dataset used in this study, is the public dataset sigfox_dataset_antwerp.csv which can be access here:

https://zenodo.org/record/3904158#.X4_h7y8RpQI

The above link is related to the publication "Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas", in which the original full dataset was published. The publication is available here:

http://www.mdpi.com/2306-5729/3/2/13

The credit for the creation of the original full dataset goes to Aernouts, Michiel; Berkvens, Rafael; Van Vlaenderen, Koen; and Weyn, Maarten.

The train/validation/test split of the original dataset that is used in this paper, is taken from our previous work "A Reproducible Analysis of RSSI Fingerprinting for Outdoors Localization Using Sigfox: Preprocessing and Hyperparameter Tuning". Using the same train/validation/test split in different works strengthens the consistency of the comparison of results. All relevant material of that work is listed below:

Preprint: https://arxiv.org/abs/1908.06851

Paper: https://ieeexplore.ieee.org/document/8911792

Code: https://zenodo.org/record/3228752

Data: https://zenodo.org/record/3228744
USI-HEAR Dataset
zenodo.org
data.niaid.nih.gov
Updated Nov 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matías Laporte; Matías Laporte; Davide Casnici; Davide Casnici; Martin Gjoreski; Shkurta Gashi; Shkurta Gashi; Silvia Santini; Silvia Santini; Marc Langheinrich; Martin Gjoreski; Marc Langheinrich (2024). USI-HEAR Dataset [Dataset]. http://doi.org/10.5281/zenodo.10843791
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10843791
Dataset updated
Nov 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matías Laporte; Matías Laporte; Davide Casnici; Davide Casnici; Martin Gjoreski; Shkurta Gashi; Shkurta Gashi; Silvia Santini; Silvia Santini; Marc Langheinrich; Martin Gjoreski; Marc Langheinrich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Aug 2022
Description
This is the open repository for the USI-HEAR dataset.

USI-HEAR is a dataset containing inertial data collected from Nokia Bell Labs' eSense earbuds. The eSense's left unit contains a 6-axis IMU (i.e., 3-axis accelerometer and 3-axis gyroscope). The dataset is comprised of data collected from 30 different participants performing 7 scripted activities (headshaking, nodding, speaking, eating, staying still, walking, and walking while speaking). Each activity was recorded over ~180 seconds. Data sampling rate is variable (with a universal lower-bound of ~60Hz) due to Android's API limitations.

Current contents:

raw_data.zip: raw sensor data, with participants' demographic information

dataset_preprocessed.zip: pre-processed data used for the corresponding publications (for reproducibility purposes)

Until the main publication corresponding to the dataset is available, we kindly ask you to contact the first author to request access to the data.

In future versions, the repository will also include:

processed data

downsampled versions

extracted features

code for the data analysis related to the (yet-unpublished) dataset's publication, containing a HAR pipeline analysis with both ML and DL techniques
f
Augmentation performance using the ResNet architecture on the CIFAR-100...
figshare.com
xls
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). Augmentation performance using the ResNet architecture on the CIFAR-100 dataset when classifying the image and feature maps. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0274767.t007
Dataset updated
Jun 13, 2023
Dataset provided by
PLOS ONE
Authors
Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Augmentation performance using the ResNet architecture on the CIFAR-100 dataset when classifying the image and feature maps.
d
Data from: Abundance estimation with sightability data: a Bayesian data...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated Jul 1, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Fieberg; Michael Alexander; Scarlett Tse; Katie St. Clair (2013). Abundance estimation with sightability data: a Bayesian data augmentation approach [Dataset]. http://doi.org/10.5061/dryad.f8669
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.f8669
Dataset updated
Jul 1, 2013
Dataset provided by
Dryad
Authors
John Fieberg; Michael Alexander; Scarlett Tse; Katie St. Clair
Time period covered
2013
Area covered
Minnesota
Description
Sightability models and dataData and JAGS models associated with the following paper published in Methods in Ecology and Evolution: Fieberg, J., Alexander, M., Tse, S,, and K. St. Clair. 2013. Abundance estimation with sightability data: a Bayesian data augmentation approach. Methods in Ecology and Evolution.Fieberg et al sightability data and models.zip
f
Contingency table of HIT results for Study 1.
figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathaniel D. Porter; Ashton M. Verdery; S. Michael Gaddis (2023). Contingency table of HIT results for Study 1. [Dataset]. http://doi.org/10.1371/journal.pone.0233154.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0233154.t003
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Nathaniel D. Porter; Ashton M. Verdery; S. Michael Gaddis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contingency table of HIT results for Study 1.
m
Aruzz22.5K: An Image Dataset of Rice Varieties
data.mendeley.com
Updated Mar 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Masudul Islam (2024). Aruzz22.5K: An Image Dataset of Rice Varieties [Dataset]. http://doi.org/10.17632/3mn9843tz2.4
Explore at:
Unique identifier
https://doi.org/10.17632/3mn9843tz2.4
Dataset updated
Mar 12, 2024
Authors
Md Masudul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This extensive dataset presents a meticulously curated collection of low-resolution images showcasing 20 well-established rice varieties native to diverse regions of Bangladesh. The rice samples were carefully gathered from both rural areas and local marketplaces, ensuring a comprehensive and varied representation. Serving as a visual compendium, the dataset provides a thorough exploration of the distinct characteristics of these rice varieties, facilitating precise classification.

Dataset Composition

The dataset encompasses 20 distinct classes, encompassing Subol Lota, Bashmoti (Deshi), Ganjiya, Shampakatari, Sugandhi Katarivog, BR-28, BR-29, Paijam, Bashful, Lal Aush, BR-Jirashail, Gutisharna, Birui, Najirshail, Pahari Birui, Polao (Katari), Polao (Chinigura), Amon, Shorna-5, and Lal Binni. In total, the dataset comprises 4,730 original JPG images and 23,650 augmented images.

Image Capture and Dataset Organization

These images were captured using an iPhone 11 camera with a 5x zoom feature. Each image capturing these rice varieties was diligently taken between October 18 and November 29, 2023. To facilitate efficient data management and organization, the dataset is structured into two variants: Original images and Augmented images. Each variant is systematically categorized into 20 distinct sub-directories, each corresponding to a specific rice variety.

Original Image Dataset

The primary image set comprises 4,730 JPG images, uniformly sized at 853 × 853 pixels. Due to the initial low resolution, the file size was notably 268 MB. Employing compression through a zip program significantly optimized the dataset, resulting in a final size of 254 MB.

Augmented Image Dataset

To address the substantial image volume requirements of deep learning models for machine vision, data augmentation techniques were implemented. Total 23,650 images was obtained from augmentation. These augmented images, also in JPG format and uniformly sized at 512 × 512 pixels, initially amounted to 781 MB. However, post-compression, the dataset was further streamlined to 699 MB.

Dataset Storage and Access

The raw and augmented datasets are stored in two distinct zip files, namely 'Original.zip' and 'Augmented.zip'. Both zip files contain 20 sub-folders representing a unique rice variety, namely 1_Subol_Lota, 2_Bashmoti, 3_Ganjiya, 4_Shampakatari, 5_Katarivog, 6_BR28, 7_BR29, 8_Paijam, 9_Bashful, 10_Lal_Aush, 11_Jirashail, 12_Gutisharna, 13_Red_Cargo,14_Najirshail, 15_Katari_Polao, 16_Lal_Biroi, 17_Chinigura_Polao, 18_Amon, 19_Shorna5, 20_Lal_Binni.

Train and Test Data Organization

To ease the experimenting process for the researchers we have balanced the data and split it in an 80:20 train-test ratio. The ‘Train_n_Test.zip’ folder contains two sub-directories: ‘1_TEST’ which contains 1125 images per class and ‘2_VALID’ which contains 225 images per class.
f
Cross-validation results across five folds.
plos.figshare.com
xls
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Outlwile Pako Mmileng; Albert Whata; Micheal Olusanya; Siyabonga Mhlongo (2025). Cross-validation results across five folds. [Dataset]. http://doi.org/10.1371/journal.pone.0313734.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0313734.t007
Dataset updated
Jun 4, 2025
Dataset provided by
PLOS ONE
Authors
Outlwile Pako Mmileng; Albert Whata; Micheal Olusanya; Siyabonga Mhlongo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Malaria continues to be a severe health problem across the globe, especially within resource-limited areas which lack both skilled diagnostic personnel and diagnostic equipment. This study investigates the use of deep learning diagnosis for malaria through ConvNeXt models that incorporate transfer learning techniques with data augmentation methods for better model performance and transferability. A total number of 606276 thin blood smear images served as the final augmented dataset after the initial 27558 images underwent augmentation. The ConvNeXt Tiny model, version V1 Tiny, achieved an accuracy of 95.9%.; however, the upgraded V2 Tiny Remod version exceeded this benchmark, reaching 98.1% accuracy. The accuracy rate measured 61.4% for Swin Tiny, ResNet18 reached 62.6%, and ResNet50 obtained 81.4%. The combination of label smoothing with the AdamW optimiser produced a model which exhibited strong robustness as well as generalisability. The enhanced ConvNeXt V2 Tiny model combined with data augmentation, transfer learning techniques and explainability frameworks demonstrate a practical solution for malaria diagnosis that achieves high accuracy despite limitations of access to large datasets and microscopy expertise, often observed in resource-limited regions. The findings highlight the potential for real-time diagnostic applications in remote healthcare facilities and the viability of ConvNeXt models in enhancing malaria diagnosis globally.
m
Fabric Defect
data.mendeley.com
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakibul Islam (2025). Fabric Defect [Dataset]. http://doi.org/10.17632/y62b4pfyz2.1
Explore at:
Unique identifier
https://doi.org/10.17632/y62b4pfyz2.1
Dataset updated
Mar 28, 2025
Authors
Rakibul Islam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A custom dataset of fabric defects was constructed, utilizing images captured from garments with various imperfections. The dataset comprises 2468 high-quality images and each image to ensure quality. The dataset contains images of six types of fabric defects. The dataset has Button-hike (307 images), Broken-button (545 images), Hole (556 images), Color-defect (384 images), Foreign-yarn(334 images), and Sewing-error (317 images). To address potential limitations arising from a finite dataset size, data augmentation techniques were employed. This augmentation process enhanced the model's ability to generalize to unseen variations in fabric defects. The images were manually annotated with defect type and bounding box information. The image annotation contains a class label, bounding box center (x,y), height, and width. For this research, datasets of the defects were collected from various garment manufacturers with their consent. However, Data collection is an ongoing process. Researchers interested in the dataset can contact us for access.
A dataset for window and blind states detection
figshare.com
bin
Updated Aug 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seunghyeon Wang (2024). A dataset for window and blind states detection [Dataset]. http://doi.org/10.6084/m9.figshare.26403004.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26403004.v1
Dataset updated
Aug 5, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Seunghyeon Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data was constructed for detecting window and blind states. All images were annotated in XML format using LabelImg for object detection tasks. The results of applying the Faster R-CNN based model include detected images and loss graphs for both training and validation in this dataset. Additionally, the raw data with other annotations can be used for applications such as semantic segmentation and image captioning.
Cross Site Scripting (XSS) Attack Dataset (2020): Generating Data for...
figshare.com
txt
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fawaz Mokbal (2023). Cross Site Scripting (XSS) Attack Dataset (2020): Generating Data for Dataset Balancing [Dataset]. http://doi.org/10.6084/m9.figshare.13046138.v6
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13046138.v6
Dataset updated
Jun 19, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Fawaz Mokbal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Real unbalanced datasets for cross-site scripting (XSS) attacks.
Data from: Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using...
acs.figshare.com
xlsx
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tingting Zhao; Brian Low; Qiming Shen; Yukai Wang; David Hidalgo Delgado; K. N. Minh Chau; Zhiqiang Pang; Xiaoxiao Li; Jianguo Xia; Xing-Fang Li; Tao Huan (2025). Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using High-Resolution Mass Spectrometry, Multistage Machine Learning, and Cloud Computing [Dataset]. http://doi.org/10.1021/acs.analchem.5c00503.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.5c00503.s001
Dataset updated
May 22, 2025
Dataset provided by
ACS Publications
Authors
Tingting Zhao; Brian Low; Qiming Shen; Yukai Wang; David Hidalgo Delgado; K. N. Minh Chau; Zhiqiang Pang; Xiaoxiao Li; Jianguo Xia; Xing-Fang Li; Tao Huan
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Over 70% of organic halogens, representing chlorine- and bromine-containing disinfection byproducts (Cl-/Br-DBPs), remain unidentified after 50 years of research. This work introduces a streamlined and cloud-based exposomics workflow that integrates high-resolution mass spectrometry (HRMS) analysis, multistage machine learning, and cloud computing for efficient analysis and characterization of Cl-/Br-DBPs. In particular, the multistage machine learning structure employs progressively different heavy isotopic peaks at each layer and capture the distinct isotopic characteristics of nonhalogenated compounds and Cl-/Br-compounds at different halogenation levels. This innovative approach enables the recognition of 22 types of Cl-/Br-compounds with up to 6 Br and 8 Cl atoms. To address the data imbalance among different classes, particularly the limited number of heavily chlorinated and brominated compounds, data perturbation is performed to generate hypothetical/synthetic molecular formulas containing multiple Cl and Br atoms, facilitating data augmentation. To further benefit the environmental chemistry community with limited computational experience and hardware access, above innovations are incorporated into HalogenFinder (http://www.halogenfinder.com/), a user-friendly, web-based platform for Cl-/Br-compound characterization, with statistical analysis support via MetaboAnalyst. In the benchmarking, HalogenFinder outperformed two established tools, achieving a higher recognition rate for 277 authentic Cl-/Br-compounds and uniquely identifying the number of Cl/Br atoms. In laboratory tests of DBP mixtures, it identified 72 Cl-/Br-DBPs with proposed structures, of which eight were confirmed with chemical standards. A retrospective analysis of 2022 finished water HRMS data revealed insightful temporal trends in Cl-DBP features. These results demonstrate HalogenFinder’s effectiveness in advancing Cl-/Br-compound identification for environmental science and exposomics.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kensuke NAKAMURA (2025). Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation [Dataset]. http://ieee-dataport.org/documents/regularization-unconditional-image-diffusion-models-shifted-data-augmentation

Data from: Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation

Explore at:

Dataset updated

Jun 22, 2025

Authors

Kensuke NAKAMURA

Description

it often causes leakage

Clear search

Close search

Google apps

Main menu

Data from: Regularization for Unconditional Image Diffusion Models via...

Respiration and Exhaled Hydration Dataset Based on Data Augmentation

ProxyFAUG: Proximity-based Fingerprint Augmentation (data)

USI-HEAR Dataset

Augmentation performance using the ResNet architecture on the CIFAR-100...

Data from: Abundance estimation with sightability data: a Bayesian data...

Contingency table of HIT results for Study 1.

Aruzz22.5K: An Image Dataset of Rice Varieties

Dataset Composition

Image Capture and Dataset Organization

Original Image Dataset

Augmented Image Dataset

Dataset Storage and Access

Train and Test Data Organization

Cross-validation results across five folds.

Fabric Defect

A dataset for window and blind states detection

Cross Site Scripting (XSS) Attack Dataset (2020): Generating Data for...

Data from: Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using...

Data from: Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation