it often causes leakage
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
monitors respiratory issues
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The supplementary data of the paper "ProxyFAUG: Proximity-based Fingerprint Augmentation".
Open access Author’s accepted manuscript version: https://arxiv.org/abs/2102.02706v2
Published paper: https://ieeexplore.ieee.org/document/9662590
The train/validation/test sets used in the paper "ProxyFAUG: Proximity-based Fingerprint Augmentation", after having passed the preprocessing process described in the paper, are made available here. Moreover, the augmentations produced by the proposed ProxyFAUG method are also made available with the files (x_aug_train.csv, y_aug_train.csv). More specifically:
x_train_pre.csv : The features side (x) information of the preprocessed training set.
x_val_pre.csv : The features side (x) information of the preprocessed validation set.
x_test_pre.csv : The features side (x) information of the preprocessed test set.
x_aug_train.csv : The features side (x) information of the fingerprints generated by ProxyFAUG.
y_train.csv : The location ground truth information (y) of the training set.
y_val.csv : The location ground truth information (y) of the validation set.
y_test.csv : The location ground truth information (y) of the test set.
y_aug_train.csv : The location ground truth information (y) of the fingerprints generated by ProxyFAUG.
Note that in the paper, the original training set (x_train_pre.csv) is used as a baseline, and is compared against the scenario where the concatenation of the original and the generated training sets (concatenation of x_train_pre.csv and x_aug_train.csv) is used.
The full code implementation related to the paper is available here:
Code: https://zenodo.org/record/4457353
The original full dataset used in this study, is the public dataset sigfox_dataset_antwerp.csv which can be access here:
https://zenodo.org/record/3904158#.X4_h7y8RpQI
The above link is related to the publication "Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas", in which the original full dataset was published. The publication is available here:
http://www.mdpi.com/2306-5729/3/2/13
The credit for the creation of the original full dataset goes to Aernouts, Michiel; Berkvens, Rafael; Van Vlaenderen, Koen; and Weyn, Maarten.
The train/validation/test split of the original dataset that is used in this paper, is taken from our previous work "A Reproducible Analysis of RSSI Fingerprinting for Outdoors Localization Using Sigfox: Preprocessing and Hyperparameter Tuning". Using the same train/validation/test split in different works strengthens the consistency of the comparison of results. All relevant material of that work is listed below:
Preprint: https://arxiv.org/abs/1908.06851
Paper: https://ieeexplore.ieee.org/document/8911792
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the open repository for the USI-HEAR dataset.
USI-HEAR is a dataset containing inertial data collected from Nokia Bell Labs' eSense earbuds. The eSense's left unit contains a 6-axis IMU (i.e., 3-axis accelerometer and 3-axis gyroscope). The dataset is comprised of data collected from 30 different participants performing 7 scripted activities (headshaking, nodding, speaking, eating, staying still, walking, and walking while speaking). Each activity was recorded over ~180 seconds. Data sampling rate is variable (with a universal lower-bound of ~60Hz) due to Android's API limitations.
Current contents:
Until the main publication corresponding to the dataset is available, we kindly ask you to contact the first author to request access to the data.
In future versions, the repository will also include:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Augmentation performance using the ResNet architecture on the CIFAR-100 dataset when classifying the image and feature maps.
Sightability models and dataData and JAGS models associated with the following paper published in Methods in Ecology and Evolution: Fieberg, J., Alexander, M., Tse, S,, and K. St. Clair. 2013. Abundance estimation with sightability data: a Bayesian data augmentation approach. Methods in Ecology and Evolution.Fieberg et al sightability data and models.zip
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contingency table of HIT results for Study 1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This extensive dataset presents a meticulously curated collection of low-resolution images showcasing 20 well-established rice varieties native to diverse regions of Bangladesh. The rice samples were carefully gathered from both rural areas and local marketplaces, ensuring a comprehensive and varied representation. Serving as a visual compendium, the dataset provides a thorough exploration of the distinct characteristics of these rice varieties, facilitating precise classification.
The dataset encompasses 20 distinct classes, encompassing Subol Lota, Bashmoti (Deshi), Ganjiya, Shampakatari, Sugandhi Katarivog, BR-28, BR-29, Paijam, Bashful, Lal Aush, BR-Jirashail, Gutisharna, Birui, Najirshail, Pahari Birui, Polao (Katari), Polao (Chinigura), Amon, Shorna-5, and Lal Binni. In total, the dataset comprises 4,730 original JPG images and 23,650 augmented images.
These images were captured using an iPhone 11 camera with a 5x zoom feature. Each image capturing these rice varieties was diligently taken between October 18 and November 29, 2023. To facilitate efficient data management and organization, the dataset is structured into two variants: Original images and Augmented images. Each variant is systematically categorized into 20 distinct sub-directories, each corresponding to a specific rice variety.
The primary image set comprises 4,730 JPG images, uniformly sized at 853 × 853 pixels. Due to the initial low resolution, the file size was notably 268 MB. Employing compression through a zip program significantly optimized the dataset, resulting in a final size of 254 MB.
To address the substantial image volume requirements of deep learning models for machine vision, data augmentation techniques were implemented. Total 23,650 images was obtained from augmentation. These augmented images, also in JPG format and uniformly sized at 512 × 512 pixels, initially amounted to 781 MB. However, post-compression, the dataset was further streamlined to 699 MB.
The raw and augmented datasets are stored in two distinct zip files, namely 'Original.zip' and 'Augmented.zip'. Both zip files contain 20 sub-folders representing a unique rice variety, namely 1_Subol_Lota, 2_Bashmoti, 3_Ganjiya, 4_Shampakatari, 5_Katarivog, 6_BR28, 7_BR29, 8_Paijam, 9_Bashful, 10_Lal_Aush, 11_Jirashail, 12_Gutisharna, 13_Red_Cargo,14_Najirshail, 15_Katari_Polao, 16_Lal_Biroi, 17_Chinigura_Polao, 18_Amon, 19_Shorna5, 20_Lal_Binni.
To ease the experimenting process for the researchers we have balanced the data and split it in an 80:20 train-test ratio. The ‘Train_n_Test.zip’ folder contains two sub-directories: ‘1_TEST’ which contains 1125 images per class and ‘2_VALID’ which contains 225 images per class.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Malaria continues to be a severe health problem across the globe, especially within resource-limited areas which lack both skilled diagnostic personnel and diagnostic equipment. This study investigates the use of deep learning diagnosis for malaria through ConvNeXt models that incorporate transfer learning techniques with data augmentation methods for better model performance and transferability. A total number of 606276 thin blood smear images served as the final augmented dataset after the initial 27558 images underwent augmentation. The ConvNeXt Tiny model, version V1 Tiny, achieved an accuracy of 95.9%.; however, the upgraded V2 Tiny Remod version exceeded this benchmark, reaching 98.1% accuracy. The accuracy rate measured 61.4% for Swin Tiny, ResNet18 reached 62.6%, and ResNet50 obtained 81.4%. The combination of label smoothing with the AdamW optimiser produced a model which exhibited strong robustness as well as generalisability. The enhanced ConvNeXt V2 Tiny model combined with data augmentation, transfer learning techniques and explainability frameworks demonstrate a practical solution for malaria diagnosis that achieves high accuracy despite limitations of access to large datasets and microscopy expertise, often observed in resource-limited regions. The findings highlight the potential for real-time diagnostic applications in remote healthcare facilities and the viability of ConvNeXt models in enhancing malaria diagnosis globally.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A custom dataset of fabric defects was constructed, utilizing images captured from garments with various imperfections. The dataset comprises 2468 high-quality images and each image to ensure quality. The dataset contains images of six types of fabric defects. The dataset has Button-hike (307 images), Broken-button (545 images), Hole (556 images), Color-defect (384 images), Foreign-yarn(334 images), and Sewing-error (317 images). To address potential limitations arising from a finite dataset size, data augmentation techniques were employed. This augmentation process enhanced the model's ability to generalize to unseen variations in fabric defects. The images were manually annotated with defect type and bounding box information. The image annotation contains a class label, bounding box center (x,y), height, and width. For this research, datasets of the defects were collected from various garment manufacturers with their consent. However, Data collection is an ongoing process. Researchers interested in the dataset can contact us for access.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data was constructed for detecting window and blind states. All images were annotated in XML format using LabelImg for object detection tasks. The results of applying the Faster R-CNN based model include detected images and loss graphs for both training and validation in this dataset. Additionally, the raw data with other annotations can be used for applications such as semantic segmentation and image captioning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Real unbalanced datasets for cross-site scripting (XSS) attacks.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Over 70% of organic halogens, representing chlorine- and bromine-containing disinfection byproducts (Cl-/Br-DBPs), remain unidentified after 50 years of research. This work introduces a streamlined and cloud-based exposomics workflow that integrates high-resolution mass spectrometry (HRMS) analysis, multistage machine learning, and cloud computing for efficient analysis and characterization of Cl-/Br-DBPs. In particular, the multistage machine learning structure employs progressively different heavy isotopic peaks at each layer and capture the distinct isotopic characteristics of nonhalogenated compounds and Cl-/Br-compounds at different halogenation levels. This innovative approach enables the recognition of 22 types of Cl-/Br-compounds with up to 6 Br and 8 Cl atoms. To address the data imbalance among different classes, particularly the limited number of heavily chlorinated and brominated compounds, data perturbation is performed to generate hypothetical/synthetic molecular formulas containing multiple Cl and Br atoms, facilitating data augmentation. To further benefit the environmental chemistry community with limited computational experience and hardware access, above innovations are incorporated into HalogenFinder (http://www.halogenfinder.com/), a user-friendly, web-based platform for Cl-/Br-compound characterization, with statistical analysis support via MetaboAnalyst. In the benchmarking, HalogenFinder outperformed two established tools, achieving a higher recognition rate for 277 authentic Cl-/Br-compounds and uniquely identifying the number of Cl/Br atoms. In laboratory tests of DBP mixtures, it identified 72 Cl-/Br-DBPs with proposed structures, of which eight were confirmed with chemical standards. A retrospective analysis of 2022 finished water HRMS data revealed insightful temporal trends in Cl-DBP features. These results demonstrate HalogenFinder’s effectiveness in advancing Cl-/Br-compound identification for environmental science and exposomics.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
it often causes leakage