13 datasets found
  1. i

    Data from: Regularization for Unconditional Image Diffusion Models via...

    • ieee-dataport.org
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kensuke NAKAMURA (2025). Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation [Dataset]. http://ieee-dataport.org/documents/regularization-unconditional-image-diffusion-models-shifted-data-augmentation
    Explore at:
    Dataset updated
    Jun 22, 2025
    Authors
    Kensuke NAKAMURA
    Description

    it often causes leakage

  2. i

    Respiration and Exhaled Hydration Dataset Based on Data Augmentation

    • ieee-dataport.org
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sagheer khan (2025). Respiration and Exhaled Hydration Dataset Based on Data Augmentation [Dataset]. https://ieee-dataport.org/open-access/respiration-and-exhaled-hydration-dataset-based-data-augmentation
    Explore at:
    Dataset updated
    Jan 7, 2025
    Authors
    sagheer khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    monitors respiratory issues

  3. Z

    ProxyFAUG: Proximity-based Fingerprint Augmentation (data)

    • data.niaid.nih.gov
    Updated Jun 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kalousis, Alexandros (2022). ProxyFAUG: Proximity-based Fingerprint Augmentation (data) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4457390
    Explore at:
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Kalousis, Alexandros
    Anagnostopoulos, Grigorios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The supplementary data of the paper "ProxyFAUG: Proximity-based Fingerprint Augmentation".

    Open access Author’s accepted manuscript version: https://arxiv.org/abs/2102.02706v2

    Published paper: https://ieeexplore.ieee.org/document/9662590

    The train/validation/test sets used in the paper "ProxyFAUG: Proximity-based Fingerprint Augmentation", after having passed the preprocessing process described in the paper, are made available here. Moreover, the augmentations produced by the proposed ProxyFAUG method are also made available with the files (x_aug_train.csv, y_aug_train.csv). More specifically:

    x_train_pre.csv : The features side (x) information of the preprocessed training set.

    x_val_pre.csv : The features side (x) information of the preprocessed validation set.

    x_test_pre.csv : The features side (x) information of the preprocessed test set.

    x_aug_train.csv : The features side (x) information of the fingerprints generated by ProxyFAUG.

    y_train.csv : The location ground truth information (y) of the training set.

    y_val.csv : The location ground truth information (y) of the validation set.

    y_test.csv : The location ground truth information (y) of the test set.

    y_aug_train.csv : The location ground truth information (y) of the fingerprints generated by ProxyFAUG.

    Note that in the paper, the original training set (x_train_pre.csv) is used as a baseline, and is compared against the scenario where the concatenation of the original and the generated training sets (concatenation of x_train_pre.csv and x_aug_train.csv) is used.

    The full code implementation related to the paper is available here:

    Code: https://zenodo.org/record/4457353

    The original full dataset used in this study, is the public dataset sigfox_dataset_antwerp.csv which can be access here:

    https://zenodo.org/record/3904158#.X4_h7y8RpQI

    The above link is related to the publication "Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas", in which the original full dataset was published. The publication is available here:

    http://www.mdpi.com/2306-5729/3/2/13

    The credit for the creation of the original full dataset goes to Aernouts, Michiel; Berkvens, Rafael; Van Vlaenderen, Koen; and Weyn, Maarten.

    The train/validation/test split of the original dataset that is used in this paper, is taken from our previous work "A Reproducible Analysis of RSSI Fingerprinting for Outdoors Localization Using Sigfox: Preprocessing and Hyperparameter Tuning". Using the same train/validation/test split in different works strengthens the consistency of the comparison of results. All relevant material of that work is listed below:

    Preprint: https://arxiv.org/abs/1908.06851

    Paper: https://ieeexplore.ieee.org/document/8911792

    Code: https://zenodo.org/record/3228752

    Data: https://zenodo.org/record/3228744

  4. USI-HEAR Dataset

    • zenodo.org
    • data.niaid.nih.gov
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matías Laporte; Matías Laporte; Davide Casnici; Davide Casnici; Martin Gjoreski; Shkurta Gashi; Shkurta Gashi; Silvia Santini; Silvia Santini; Marc Langheinrich; Martin Gjoreski; Marc Langheinrich (2024). USI-HEAR Dataset [Dataset]. http://doi.org/10.5281/zenodo.10843791
    Explore at:
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matías Laporte; Matías Laporte; Davide Casnici; Davide Casnici; Martin Gjoreski; Shkurta Gashi; Shkurta Gashi; Silvia Santini; Silvia Santini; Marc Langheinrich; Martin Gjoreski; Marc Langheinrich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 2022
    Description

    This is the open repository for the USI-HEAR dataset.

    USI-HEAR is a dataset containing inertial data collected from Nokia Bell Labs' eSense earbuds. The eSense's left unit contains a 6-axis IMU (i.e., 3-axis accelerometer and 3-axis gyroscope). The dataset is comprised of data collected from 30 different participants performing 7 scripted activities (headshaking, nodding, speaking, eating, staying still, walking, and walking while speaking). Each activity was recorded over ~180 seconds. Data sampling rate is variable (with a universal lower-bound of ~60Hz) due to Android's API limitations.

    Current contents:

    • raw_data.zip: raw sensor data, with participants' demographic information
    • dataset_preprocessed.zip: pre-processed data used for the corresponding publications (for reproducibility purposes)

    Until the main publication corresponding to the dataset is available, we kindly ask you to contact the first author to request access to the data.

    In future versions, the repository will also include:

    • processed data
      • downsampled versions
      • extracted features
    • code for the data analysis related to the (yet-unpublished) dataset's publication, containing a HAR pipeline analysis with both ML and DL techniques
  5. f

    Augmentation performance using the ResNet architecture on the CIFAR-100...

    • figshare.com
    xls
    Updated Jun 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim (2023). Augmentation performance using the ResNet architecture on the CIFAR-100 dataset when classifying the image and feature maps. [Dataset]. http://doi.org/10.1371/journal.pone.0274767.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Junhyeok An; Soojin Jang; Junehyoung Kwon; Kyohoon Jin; YoungBin Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Augmentation performance using the ResNet architecture on the CIFAR-100 dataset when classifying the image and feature maps.

  6. d

    Data from: Abundance estimation with sightability data: a Bayesian data...

    • datadryad.org
    • data.niaid.nih.gov
    • +2more
    zip
    Updated Jul 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Fieberg; Michael Alexander; Scarlett Tse; Katie St. Clair (2013). Abundance estimation with sightability data: a Bayesian data augmentation approach [Dataset]. http://doi.org/10.5061/dryad.f8669
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 1, 2013
    Dataset provided by
    Dryad
    Authors
    John Fieberg; Michael Alexander; Scarlett Tse; Katie St. Clair
    Time period covered
    2013
    Area covered
    Minnesota
    Description

    Sightability models and dataData and JAGS models associated with the following paper published in Methods in Ecology and Evolution: Fieberg, J., Alexander, M., Tse, S,, and K. St. Clair. 2013. Abundance estimation with sightability data: a Bayesian data augmentation approach. Methods in Ecology and Evolution.Fieberg et al sightability data and models.zip

  7. f

    Contingency table of HIT results for Study 1.

    • figshare.com
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathaniel D. Porter; Ashton M. Verdery; S. Michael Gaddis (2023). Contingency table of HIT results for Study 1. [Dataset]. http://doi.org/10.1371/journal.pone.0233154.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Nathaniel D. Porter; Ashton M. Verdery; S. Michael Gaddis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contingency table of HIT results for Study 1.

  8. m

    Aruzz22.5K: An Image Dataset of Rice Varieties

    • data.mendeley.com
    Updated Mar 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Masudul Islam (2024). Aruzz22.5K: An Image Dataset of Rice Varieties [Dataset]. http://doi.org/10.17632/3mn9843tz2.4
    Explore at:
    Dataset updated
    Mar 12, 2024
    Authors
    Md Masudul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This extensive dataset presents a meticulously curated collection of low-resolution images showcasing 20 well-established rice varieties native to diverse regions of Bangladesh. The rice samples were carefully gathered from both rural areas and local marketplaces, ensuring a comprehensive and varied representation. Serving as a visual compendium, the dataset provides a thorough exploration of the distinct characteristics of these rice varieties, facilitating precise classification.

    Dataset Composition

    The dataset encompasses 20 distinct classes, encompassing Subol Lota, Bashmoti (Deshi), Ganjiya, Shampakatari, Sugandhi Katarivog, BR-28, BR-29, Paijam, Bashful, Lal Aush, BR-Jirashail, Gutisharna, Birui, Najirshail, Pahari Birui, Polao (Katari), Polao (Chinigura), Amon, Shorna-5, and Lal Binni. In total, the dataset comprises 4,730 original JPG images and 23,650 augmented images.

    Image Capture and Dataset Organization

    These images were captured using an iPhone 11 camera with a 5x zoom feature. Each image capturing these rice varieties was diligently taken between October 18 and November 29, 2023. To facilitate efficient data management and organization, the dataset is structured into two variants: Original images and Augmented images. Each variant is systematically categorized into 20 distinct sub-directories, each corresponding to a specific rice variety.

    Original Image Dataset

    The primary image set comprises 4,730 JPG images, uniformly sized at 853 × 853 pixels. Due to the initial low resolution, the file size was notably 268 MB. Employing compression through a zip program significantly optimized the dataset, resulting in a final size of 254 MB.

    Augmented Image Dataset

    To address the substantial image volume requirements of deep learning models for machine vision, data augmentation techniques were implemented. Total 23,650 images was obtained from augmentation. These augmented images, also in JPG format and uniformly sized at 512 × 512 pixels, initially amounted to 781 MB. However, post-compression, the dataset was further streamlined to 699 MB.

    Dataset Storage and Access

    The raw and augmented datasets are stored in two distinct zip files, namely 'Original.zip' and 'Augmented.zip'. Both zip files contain 20 sub-folders representing a unique rice variety, namely 1_Subol_Lota, 2_Bashmoti, 3_Ganjiya, 4_Shampakatari, 5_Katarivog, 6_BR28, 7_BR29, 8_Paijam, 9_Bashful, 10_Lal_Aush, 11_Jirashail, 12_Gutisharna, 13_Red_Cargo,14_Najirshail, 15_Katari_Polao, 16_Lal_Biroi, 17_Chinigura_Polao, 18_Amon, 19_Shorna5, 20_Lal_Binni.

    Train and Test Data Organization

    To ease the experimenting process for the researchers we have balanced the data and split it in an 80:20 train-test ratio. The ‘Train_n_Test.zip’ folder contains two sub-directories: ‘1_TEST’ which contains 1125 images per class and ‘2_VALID’ which contains 225 images per class.

  9. f

    Cross-validation results across five folds.

    • plos.figshare.com
    xls
    Updated Jun 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Outlwile Pako Mmileng; Albert Whata; Micheal Olusanya; Siyabonga Mhlongo (2025). Cross-validation results across five folds. [Dataset]. http://doi.org/10.1371/journal.pone.0313734.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Outlwile Pako Mmileng; Albert Whata; Micheal Olusanya; Siyabonga Mhlongo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Malaria continues to be a severe health problem across the globe, especially within resource-limited areas which lack both skilled diagnostic personnel and diagnostic equipment. This study investigates the use of deep learning diagnosis for malaria through ConvNeXt models that incorporate transfer learning techniques with data augmentation methods for better model performance and transferability. A total number of 606276 thin blood smear images served as the final augmented dataset after the initial 27558 images underwent augmentation. The ConvNeXt Tiny model, version V1 Tiny, achieved an accuracy of 95.9%.; however, the upgraded V2 Tiny Remod version exceeded this benchmark, reaching 98.1% accuracy. The accuracy rate measured 61.4% for Swin Tiny, ResNet18 reached 62.6%, and ResNet50 obtained 81.4%. The combination of label smoothing with the AdamW optimiser produced a model which exhibited strong robustness as well as generalisability. The enhanced ConvNeXt V2 Tiny model combined with data augmentation, transfer learning techniques and explainability frameworks demonstrate a practical solution for malaria diagnosis that achieves high accuracy despite limitations of access to large datasets and microscopy expertise, often observed in resource-limited regions. The findings highlight the potential for real-time diagnostic applications in remote healthcare facilities and the viability of ConvNeXt models in enhancing malaria diagnosis globally.

  10. m

    Fabric Defect

    • data.mendeley.com
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakibul Islam (2025). Fabric Defect [Dataset]. http://doi.org/10.17632/y62b4pfyz2.1
    Explore at:
    Dataset updated
    Mar 28, 2025
    Authors
    Rakibul Islam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A custom dataset of fabric defects was constructed, utilizing images captured from garments with various imperfections. The dataset comprises 2468 high-quality images and each image to ensure quality. The dataset contains images of six types of fabric defects. The dataset has Button-hike (307 images), Broken-button (545 images), Hole (556 images), Color-defect (384 images), Foreign-yarn(334 images), and Sewing-error (317 images). To address potential limitations arising from a finite dataset size, data augmentation techniques were employed. This augmentation process enhanced the model's ability to generalize to unseen variations in fabric defects. The images were manually annotated with defect type and bounding box information. The image annotation contains a class label, bounding box center (x,y), height, and width. For this research, datasets of the defects were collected from various garment manufacturers with their consent. However, Data collection is an ongoing process. Researchers interested in the dataset can contact us for access.

  11. A dataset for window and blind states detection

    • figshare.com
    bin
    Updated Aug 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seunghyeon Wang (2024). A dataset for window and blind states detection [Dataset]. http://doi.org/10.6084/m9.figshare.26403004.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Seunghyeon Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data was constructed for detecting window and blind states. All images were annotated in XML format using LabelImg for object detection tasks. The results of applying the Faster R-CNN based model include detected images and loss graphs for both training and validation in this dataset. Additionally, the raw data with other annotations can be used for applications such as semantic segmentation and image captioning.

  12. Cross Site Scripting (XSS) Attack Dataset (2020): Generating Data for...

    • figshare.com
    txt
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fawaz Mokbal (2023). Cross Site Scripting (XSS) Attack Dataset (2020): Generating Data for Dataset Balancing [Dataset]. http://doi.org/10.6084/m9.figshare.13046138.v6
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Fawaz Mokbal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Real unbalanced datasets for cross-site scripting (XSS) attacks.

  13. Data from: Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using...

    • acs.figshare.com
    xlsx
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tingting Zhao; Brian Low; Qiming Shen; Yukai Wang; David Hidalgo Delgado; K. N. Minh Chau; Zhiqiang Pang; Xiaoxiao Li; Jianguo Xia; Xing-Fang Li; Tao Huan (2025). Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using High-Resolution Mass Spectrometry, Multistage Machine Learning, and Cloud Computing [Dataset]. http://doi.org/10.1021/acs.analchem.5c00503.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset provided by
    ACS Publications
    Authors
    Tingting Zhao; Brian Low; Qiming Shen; Yukai Wang; David Hidalgo Delgado; K. N. Minh Chau; Zhiqiang Pang; Xiaoxiao Li; Jianguo Xia; Xing-Fang Li; Tao Huan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Over 70% of organic halogens, representing chlorine- and bromine-containing disinfection byproducts (Cl-/Br-DBPs), remain unidentified after 50 years of research. This work introduces a streamlined and cloud-based exposomics workflow that integrates high-resolution mass spectrometry (HRMS) analysis, multistage machine learning, and cloud computing for efficient analysis and characterization of Cl-/Br-DBPs. In particular, the multistage machine learning structure employs progressively different heavy isotopic peaks at each layer and capture the distinct isotopic characteristics of nonhalogenated compounds and Cl-/Br-compounds at different halogenation levels. This innovative approach enables the recognition of 22 types of Cl-/Br-compounds with up to 6 Br and 8 Cl atoms. To address the data imbalance among different classes, particularly the limited number of heavily chlorinated and brominated compounds, data perturbation is performed to generate hypothetical/synthetic molecular formulas containing multiple Cl and Br atoms, facilitating data augmentation. To further benefit the environmental chemistry community with limited computational experience and hardware access, above innovations are incorporated into HalogenFinder (http://www.halogenfinder.com/), a user-friendly, web-based platform for Cl-/Br-compound characterization, with statistical analysis support via MetaboAnalyst. In the benchmarking, HalogenFinder outperformed two established tools, achieving a higher recognition rate for 277 authentic Cl-/Br-compounds and uniquely identifying the number of Cl/Br atoms. In laboratory tests of DBP mixtures, it identified 72 Cl-/Br-DBPs with proposed structures, of which eight were confirmed with chemical standards. A retrospective analysis of 2022 finished water HRMS data revealed insightful temporal trends in Cl-DBP features. These results demonstrate HalogenFinder’s effectiveness in advancing Cl-/Br-compound identification for environmental science and exposomics.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kensuke NAKAMURA (2025). Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation [Dataset]. http://ieee-dataport.org/documents/regularization-unconditional-image-diffusion-models-shifted-data-augmentation

Data from: Regularization for Unconditional Image Diffusion Models via Shifted Data Augmentation

Related Article
Explore at:
Dataset updated
Jun 22, 2025
Authors
Kensuke NAKAMURA
Description

it often causes leakage

Search
Clear search
Close search
Google apps
Main menu