60 datasets found

HALO Synthetic Data
figshare.com
zip
Updated Aug 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brandon Theodorou; Jimeng Sun; Cao Xiao (2023). HALO Synthetic Data [Dataset]. http://doi.org/10.6084/m9.figshare.23811162.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23811162.v3
Dataset updated
Aug 7, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Brandon Theodorou; Jimeng Sun; Cao Xiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic electronic health record data accompanying the paper "Synthesizing High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Mode"
Data archive for paper "Copula-based synthetic data augmentation for...
zenodo.org
zip
Updated Mar 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Meyer; David Meyer (2022). Data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators" [Dataset]. http://doi.org/10.5281/zenodo.4320795
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4320795
Dataset updated
Mar 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Meyer; David Meyer
Description
Overview

This is the data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators". It contains the paper’s data archive with model outputs (see results folder) and the Singularity image for (optionally) re-running experiments.

For the Python tool used to generate synthetic data, please refer to Synthia.

Requirements

Singularity >= 3

Portable Batch System (PBS) job scheduler*

Today's high-performance computer (e.g. ~ 32 CPUs @ 2 500 MHz with 64 GB of RAM )

*Although PBS in not a strict requirement, it is required to run all helper scripts as included in this repository. Please note that depending on your specific system settings and resource availability, you may need to modify PBS parameters at the top of submit scripts stored in the hpc directory (e.g. #PBS -lwalltime=72:00:00).

Usage

To reproduce the results from the experiments described in the paper, first fit all copula models to the reduced NWP-SAF dataset with:

qsub hpc/fit.sh

then, to generate synthetic data, run all machine learning model configurations, and compute the relevant statistics use:

qsub hpc/stats.sh qsub hpc/ml_control.sh qsub hpc/ml_synth.sh

Finally, to plot all artifacts included in the paper use:

qsub hpc/plot.sh

Licence

Code released under MIT license. Data from the reduced NWP-SAF dataset released under CC BY 4.0.
Z
Data from: A Large-scale Synthetic Pathological Dataset for Deep...
data.niaid.nih.gov
zenodo.org
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He Wang (2023). A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6633720
Explore at:
Dataset updated
Jun 9, 2023
Dataset provided by
Kexin Ding
Olivier Gevaert
He Wang
Dimitris Metaxas
Mu Zhou
Shaoting Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset access for the paper: A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer
R
Dataset for publication: Usefulness of synthetic datasets for diatom...
entrepot.recherche.data.gouv.fr
bin, jpeg +3
Updated Jul 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Laviale; Martin Laviale; Aishwarya Venkataramanan; Aishwarya Venkataramanan (2023). Dataset for publication: Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach [Dataset]. http://doi.org/10.12763/UADENQ
Explore at:
text/x-python(1957), text/x-python(4882), jpeg(7239), tsv(1716), text/markdown(2269), bin(1530), bin(456), text/x-python(8545), text/x-python(652), bin(50188610), text/x-python(3391), text/x-python(12356)Available download formats
Unique identifier
https://doi.org/10.12763/UADENQ
Dataset updated
Jul 21, 2023
Dataset provided by
Recherche Data Gouv
Authors
Martin Laviale; Martin Laviale; Aishwarya Venkataramanan; Aishwarya Venkataramanan
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This repository contains the dataset and code used to generate synthetic dataset as explained in the paper "Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach". Dataset : The dataset consists of two components: individual diatom images extracted from publicly available diatom atlases [1,2,3] and individual debris images. - Individual diatom images : currently, the repository consists of 166 diatom species, totalling 9230 images. These images were automatically extracted from atlases using PDF scraping, cleaned and verified by diatom taxonomists. The subfolders within each diatom specie indicates the origin of the images: RA[1], IDF[2], BRG[3]. Additional diatom species and images will be regularly updated in the repository. - Individual debris images : the debris images were extracted from real microscopy images. The repository contains 600 debris objects. Code : Contains the code used to generate synthetic microscopy images. For details on how to use the code, kindly refer to the README file available in synthetic_data_generator/.
f
DataSheet1_Synthetic data at scale: a development model to efficiently...
figshare.com
zip
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Klein; Rebekah Waller; Sören Pirk; Wojtek Pałubicki; Mark Tester; Dominik L. Michels (2024). DataSheet1_Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture.zip [Dataset]. http://doi.org/10.3389/fpls.2024.1360113.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2024.1360113.s001
Dataset updated
Sep 16, 2024
Dataset provided by
Frontiers
Authors
Jonathan Klein; Rebekah Waller; Sören Pirk; Wojtek Pałubicki; Mark Tester; Dominik L. Michels
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
Dataset of synthetic whole-body bone scintigraphy scans representing three...
zenodo.org
data.niaid.nih.gov
zip
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Haberl; David Haberl (2024). Dataset of synthetic whole-body bone scintigraphy scans representing three clinical conditions [Dataset]. http://doi.org/10.5281/zenodo.13275306
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13275306
Dataset updated
Aug 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Haberl; David Haberl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We here provide an image dataset consisting of 1,000 synthetic whole-body bone scintigraphy scans (anterior projection) generated by a generative artificial intelligence model. This dataset consists of images representing three different clinical conditions: (1) bone uptake indicative of bone metastases, (2) cardiac uptake indicative of cardiac amyloidosis, and (3) none of the two.

The clinical condition (label) of each image is provided in the csv file:

Label 1: Bone uptake indicative of bone metastases (n=250 scans)

Label 2: Cardiac uptake indicative of cardiac amyloidosis (n=250 scans)

Label 0: None of the two (n=500 scans)

This synthetic dataset does not comprise real patient data. The provided synthetic images were created by a generative artificial intelligence model. The model was trained on bone scintigraphy scans (radiotracer: 99mTc-DPD) from 9,170 patients from the Vienna General Hospital collected as part of the clinical routine. The training data covered a wide range of different pathologies, scanners, and imaging protocols. Hence, the provided synthetic dataset represents real-world data without disclosing patient privacy.

More details about the dataset can be found in the corresponding paper (link added upon publication). Please cite this paper if you use the dataset.
SYNTHETIC dataset attached to the paper "Grasp Pre-shape Selection by...
zenodo.org
zip
Updated Nov 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federico Vasile; Elisa Maiettini; Giulia Pasquale; Astrid Florio; Nicolo Boccardo; Lorenzo Natale; Federico Vasile; Elisa Maiettini; Giulia Pasquale; Astrid Florio; Nicolo Boccardo; Lorenzo Natale (2022). SYNTHETIC dataset attached to the paper "Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis" [Dataset]. http://doi.org/10.5281/zenodo.7327516
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7327516
Dataset updated
Nov 27, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Federico Vasile; Elisa Maiettini; Giulia Pasquale; Astrid Florio; Nicolo Boccardo; Lorenzo Natale; Federico Vasile; Elisa Maiettini; Giulia Pasquale; Astrid Florio; Nicolo Boccardo; Lorenzo Natale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SYNTHETIC dataset to replicate the results in "Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis", accepted to IEEE/RSJ IROS 2022.

In order to fully reproduce the experiments, download also the REAL dataset.

To automatically download the REAL and SYNTHETIC dataset, run the script provided at the link below.

Code to replicate the results available at: https://github.com/hsp-iit/prosthetic-grasping-experiments
H
Synthea synthetic patient data for stroke risk prediction machine learning
dataverse.harvard.edu
Updated Nov 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AJ Chen (2022). Synthea synthetic patient data for stroke risk prediction machine learning [Dataset]. http://doi.org/10.7910/DVN/EXVWQY
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/EXVWQY
Dataset updated
Nov 13, 2022
Dataset provided by
Harvard Dataverse
Authors
AJ Chen
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.7910/DVN/EXVWQYhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.7910/DVN/EXVWQY
Description
This dataset contains Synthea synthetic patient data used in building ML models for stroke risk prediction. The ML models are used to simulate ML-enabled LHS. See the first LHS simulation paper published in Nature Scientific Reports. This open dataset is part of the synthetic data repository of the Open LHS project on GitHub: https://github.com/lhs-open/synthetic-data.
H
Synthea synthetic patient data for lung cancer risk prediction machine...
dataverse.harvard.edu
Updated Nov 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AJ Chen (2022). Synthea synthetic patient data for lung cancer risk prediction machine learning [Dataset]. http://doi.org/10.7910/DVN/GD5XWE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/GD5XWE
Dataset updated
Nov 13, 2022
Dataset provided by
Harvard Dataverse
Authors
AJ Chen
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/GD5XWEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/GD5XWE
Description
This dataset contains Synthea synthetic patient data used in building ML models for lung cancer risk prediction. The ML models are used to simulate ML-enabled LHS. This open dataset is part of the synthetic data repository of the Open LHS project on GitHub: https://github.com/lhs-open/synthetic-data. For data source and methods, see the first ML-LHS simulation paper published in Nature Scientific Reports: https://www.nature.com/articles/s41598-022-23011-4.
Parcel3D - A Synthetic Dataset of Damaged and Intact Parcel Images with 2D...
zenodo.org
explore.openaire.eu
+1more
zip
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Naumann; Alexander Naumann; Felix Hertlein; Felix Hertlein; Laura Dörr; Laura Dörr; Kai Furmans; Kai Furmans (2023). Parcel3D - A Synthetic Dataset of Damaged and Intact Parcel Images with 2D and 3D Annotations [Dataset]. http://doi.org/10.5281/zenodo.8032204
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8032204
Dataset updated
Jul 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander Naumann; Alexander Naumann; Felix Hertlein; Felix Hertlein; Laura Dörr; Laura Dörr; Kai Furmans; Kai Furmans
Description
Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.

Relevant computer vision tasks:

bounding box detection

classification

instance segmentation

keypoint estimation

3D bounding box estimation

3D voxel reconstruction

3D reconstruction

The dataset is for academic research use only, since it uses resources with restrictive licenses.
For a detailed description of how the resources are used, we refer to our paper and project page.

Licenses of the resources in detail:

Google Scanned Objects: CC BY 4.0 (for details on which files are used, see the respective meta folder)

Cardboard Dataset: CC BY 4.0

Shipping Label Dataset: CC BY-NC 4.0

Other Labels: See file misc/source_urls.json

LDR Dataset: License for Non-Commercial Use

Large Logo Dataset (LLD): Please notice that this dataset is made available for academic research purposes only. All the images are collected from the Internet, and the copyright belongs to the original owners. If any of the images belongs to you and you would like it removed, please kindly inform us, we will remove it from our dataset immediately.

You can use our textureless models (i.e. the obj files) of damaged parcels under CC BY 4.0 (note that this does not apply to the textures).

If you use this resource for scientific research, please consider citing

@inproceedings{naumannParcel3DShapeReconstruction2023, author = {Naumann, Alexander and Hertlein, Felix and D\"orr, Laura and Furmans, Kai}, title = {Parcel3D: Shape Reconstruction From Single RGB Images for Applications in Transportation Logistics}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {4402-4412} }
Bearings with Varying Degradation Behaviors
kaggle.com
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prognostics @ HSE (2025). Bearings with Varying Degradation Behaviors [Dataset]. http://doi.org/10.34740/kaggle/ds/6534156
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/6534156
Dataset updated
Jan 23, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prognostics @ HSE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context: The Bearings with Varying Degradation Behaviors data set is a synthetic data set representing the run-to-failure degradation data of rolling bearings. This data set is designed to facilitate the development and evaluation of diagnostic and prognostic methods in the context of Prognostics and Health Management (PHM). For the generation of the data set, the simulation model presented by Mauthe, Hagmeyer, and Zeiler (2025) was used. The simulation model is publicly available on GitHub.

Simulation Model: Mauthe, Hagmeyer, and Zeiler (2025) introduce a generic simulation model for generating representative run-to-failure data of rolling bearings. It is designed to address challenges in the development of data-driven diagnostic and prognostic methods, such as unbalanced or limited data availability. The model consists of three modular components: the life and fault modeling, the degradation progression simulation, and the vibration signal generation. Each module incorporates random processes to reproduce real-world variations, such as differences in bearing lives and degradation progressions under similar operating conditions. The model simulates vibration signals throughout a bearing's life, reflecting both operating and degradation conditions. As such, the versatile model enables its users to create synthetic data sets of rolling bearings tailored to specific scenarios. A more detailed description of the model can be found in the corresponding paper (see Data Set Citation).

Given Data Scenario and Specification: See the provided description file Bearings_with_Varying_Degradation_Behaviors.pdf

Task: The data set contains training and test data, consisting of run-to-failure data from 28 and 12 simulated bearings. The objective of the data set is to predict the remaining useful life (RUL) of the rolling bearings within the given test data. All data proceed up to the identical failure threshold, which means that RUL=0 applies to the last point in time and the last vibration measurement, respectively.

Data Set Creator: Hochschule Esslingen – University of Applied Sciences, Institute for Technical Reliability and Prognostics (IZP), Robert-Bosch-Straße 1, 73037 Göppingen, Germany

Data Set Citation: Mauthe, F.; Hagmeyer, S.; Zeiler, P. (2025). Holistic simulation model of the temporal degradation of rolling bearings. In E. B. Abrahamsen, T. Aven, F. Bouder, R. Flage, and M. Yloenen (Eds.), Proceedings of the 35nd European Safety and Reliability Conference (Accepted). Research Publishing.
g
Synthetic Rock Paper Scissors Dataset
gts.ai
json
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Synthetic Rock Paper Scissors Dataset [Dataset]. https://gts.ai/dataset-download/synthetic-rock-paper-scissors-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jul 30, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore the Synthetic Rock Paper Scissors Dataset featuring a diverse collection of augmented images for training and testing machine learning models.
P
Digital twin-supported deep learning for fault diagnosis Dataset
paperswithcode.com
Updated May 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Killian Mc Court; Xavier Mc Court; Shijia Du; Zhiguo Zeng (2025). Digital twin-supported deep learning for fault diagnosis Dataset [Dataset]. https://paperswithcode.com/dataset/digital-twin-supported-deep-learning-for
Explore at:
Dataset updated
May 11, 2025
Authors
Killian Mc Court; Xavier Mc Court; Shijia Du; Zhiguo Zeng
Description
This is a dataset used to test deep learning-supported deep learning for fault diagnosis: - A digital twin model for a robot. - A synthetic data from the digital twin to train a deep learning-based fault diagnosis model. - A real dataset collected from the real robot to test the sim-to-real performance. Download the dataset from: https://nextcloud.centralesupelec.fr/s/7AR6aamBZNXcRM8/download
u
SynthCity Dataset - All Areas
rdr.ucl.ac.uk
figshare.com
zip
Updated Sep 11, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Griffiths; Jan Boehm (2019). SynthCity Dataset - All Areas [Dataset]. http://doi.org/10.5522/04/8850974.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5522/04/8850974.v2
Dataset updated
Sep 11, 2019
Dataset provided by
University College London
Authors
David Griffiths; Jan Boehm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist, which if proved effective can be exploited. We therefore argue that research in this domain would be hugely useful. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is labelled from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender. See our project website http://www.synthcity.xyz or paper https://arxiv.org/abs/1907.04758 for more information.
m
Synthetic COVID-19
data.mendeley.com
narcis.nl
Updated Aug 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hao jiang (2020). Synthetic COVID-19 [Dataset]. http://doi.org/10.17632/kdn5v76wb3.2
Explore at:
Unique identifier
https://doi.org/10.17632/kdn5v76wb3.2
Dataset updated
Aug 11, 2020
Authors
hao jiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic COVID-19 dataset, including 1186 chest CT images These data come from ‘Deep Learning for COVID-19 chest CT (computed tomography) image analysis’ The deep learning model used in the paper is CycleGAN, and the classification experiment is used to test the usability of the Synthetic COVID-19 dataset
Z
Data from: Survey2Survey: A deep learning generative model approach for...
data.niaid.nih.gov
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buncher, Brandon (2024). Survey2Survey: A deep learning generative model approach for cross-survey image mapping [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10011737
Explore at:
Dataset updated
Jul 11, 2024
Dataset authored and provided by
Buncher, Brandon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
During the last decade, there has been an explosive growth in survey data and deep learning techniques, both of which have enabled great advances for astronomy. The amount of data from various surveys from multiple epochs with a wide range of wavelengths, albeit with varying brightness and quality, is overwhelming, and leveraging information from overlapping observations from different surveys has limitless potential in understanding galaxy formation and evolution. Synthetic galaxy image generation using physical models has been an important tool for survey data analysis, while deep learning generative models show great promise. In this paper, we present a novel approach for robustly expanding and improving survey data through cross survey feature translation. We trained two types of neural networks to map images from the Sloan Digital Sky Survey (SDSS) to corresponding images from the Dark Energy Survey (DES). This map was used to generate false DES representations of SDSS images, increasing the brightness and S/N while retaining important morphological information. We substantiate the robustness of our method by generating DES representations of SDSS images from outside the overlapping region, showing that the brightness and quality are improved even when the source images are of lower quality than the training images. Finally, we highlight several images in which the reconstruction process appears to have removed large artifacts from SDSS images. While only an initial application, our method shows promise as a method for robustly expanding and improving the quality of optical survey data and provides a potential avenue for cross-band reconstruction.

This repository contains the image files from Survey2Survey: a deep learning generative model approach for cross-survey image mapping. Please cite https://arxiv.org/abs/2011.07124 if you use this data in a publication. For more information, contact Brandon Buncher at buncher2(at)illinois.edu --- Directory structure --- tutorial.ipynb demonstrates how to load the image files (uploaded here as tarballs). Images were obtained from the SDSS DR16 cutout server (https://skyserver.sdss.org/dr16/en/help/docs/api.aspx) and DES DR1 cutout server (https://des.ncsa.illinois.edu/desaccess/

./sdss_train/ and ./des_train/ contain the original SDSS and DES images used to train the neural network (Stripe82) ./sdss_test/ and ./des_test/ contain the original SDSS and DES images used for the validation dataset (Stripe82) ./sdss_ext/ contain images from the external SDSS dataset (SDSS images without a DES counterpart, outside Stripe82) ./cae and ./cyclegan contain images generated by the CAE and CycleGAN, respectively. train_decoded/ and test_decoded/ contain the reconstructions of the images from the training dataset and test dataset, respectively. external_decoded/ contain the DES-like image reconstructions of SDSS objects from the external dataset (outside Stripe82).
Z
Network Digital Twin-Generated Dataset for Machine Learning-Based Detection...
data.niaid.nih.gov
Updated Nov 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pastor Perales, Antonio (2024). Network Digital Twin-Generated Dataset for Machine Learning-Based Detection of Benign and Malicious Heavy Hitter Flows [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14134645
Explore at:
Dataset updated
Nov 13, 2024
Dataset provided by
R. López, Diego
Nuñez Fuente, Javier
Mozo Velasco, Alberto
Pastor Perales, Antonio
de la Cal García, Luis
Karamchandani Batra, Amit
Moreno Meneses, Yenny
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset used in this study is publicly available for research purposes. If you are using this dataset, please cite the following paper, which outlines the complete details of the dataset and the methodology used for its generation:

Amit Karamchandani, Javier Núñez, Luis de-la-Cal, Yenny Moreno, Alberto Mozo, Antonio Pastor, "On the Applicability of Network Digital Twins in Generating Synthetic Data for Heavy Hitter Discrimination," under submission.

This dataset contains a synthetic dataset generated to differentiate between benign and malicious heavy hitter flows within complex network environments. Heavy Hitter flows, which include high-volume data transfers, can significantly impact network performance, leading to congestion and degraded quality of service. Distinguishing legitimate heavy hitter activity from malicious Distributed Denial-of-Service traffic is critical for network management and security, yet existing datasets lack the granularity needed for training machine learning models to effectively make this distinction.

To address this, a Network Digital Twin (NDT) approach was utilized to emulate realistic network conditions and traffic patterns, enabling automated generation of labeled data for both benign and malicious HH flows alongside regular traffic.

The feature set includes flow statistics commonly used in network analysis, such as:

Traffic protocol type,

Flow duration (the time between the initial and final packet in both directions),

Total count of payload packets transmitted in both directions,

Cumulative bytes transmitted in both directions,

Time discrepancy between the first packet observations at the source and destination,

Packet and byte transmission rates per second within each interval, and

Total packet and byte counts within each interval in both directions.
A dataset of published journal papers using neural networks for...
zenodo.org
data.niaid.nih.gov
csv
Updated Mar 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. Mostafa Mousavi; S. Mostafa Mousavi; Gregory Beroza; Gregory Beroza (2022). A dataset of published journal papers using neural networks for seismological tasks. [Dataset]. http://doi.org/10.5281/zenodo.6386952
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6386952
Dataset updated
Mar 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
S. Mostafa Mousavi; S. Mostafa Mousavi; Gregory Beroza; Gregory Beroza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of 637 journal papers applying neural networks for various tasks in seismology spanning January 1988 to January 2022. The dataset mainly includes peer reviewed papers and does not contain duplicated works. It follows a hierarchical classification of papers based on seismological tasks (i.e. category, sub_category_I, sub_category_II, task, and sub_task). For each paper following information are provided: 1) first author's last name, 2) publication year, 3) paper's title, 4) journal 's name, 5) machine learning method used, 6) the type of used neural network, 7) the name of neural network architecture, 8) the number of neurons/kernels in each hidden layer, 9) type of training process, i.e. supervised, semi-supervised, etc, 10) input data into the network, 11) output data, 12) data domain, i.e. time, frequency, feature, etc, 13) the type of data used for training, e.g. synthetic or real data, 14) the size of training set, 15) the metrics used to measure the performance, 16) performance scores, 17) the baseline method used for evaluation, and 18) a short note summarizing the paper's objective, its approach, and its significance.

An updating version of the dataset can be find from here: https://smousavi05.github.io/dl_seismology/ and here:https://github.com/smousavi05/dl_seismology/tree/main/docs.

An updating glossary of seismological tasks and relevant machine learning techniques and papers are provided here: https://smousavi05.gitbook.io/mlseismology/
o
(capsicum) deepNIR: Dataset for generating synthetic NIR images
explore.openaire.eu
Updated Mar 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inkyu Sa; Jong Yoon Lim; Ho Seok Ahn (2022). (capsicum) deepNIR: Dataset for generating synthetic NIR images [Dataset]. http://doi.org/10.5281/zenodo.6340414
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6340414
Dataset updated
Mar 9, 2022
Authors
Inkyu Sa; Jong Yoon Lim; Ho Seok Ahn
Description
This dataset contains capsicum NIR+RGB dataset used in our paper; deepNIR: Dataset for generating synthetic NIR images and improved fruit detection system using deep learning techniques. Please refer to http://tiny.one/deepNIR for more detail.
Z
AraucanaXAI - HEPAR synthetic datasets
data.niaid.nih.gov
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanna Nicora (2023). AraucanaXAI - HEPAR synthetic datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6726767
Explore at:
Dataset updated
Jan 23, 2023
Dataset authored and provided by
Giovanna Nicora
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supporting datasets (iid and ood) used in the evaluation experiments of the paper "Why did AI get this one wrong? - tree-based explanations of machine learning model predictions" by Parimbelli, Buonocore, Nicora, Michalowski, Wilk and Bellazzi.

Facebook

Twitter

Click to copy link

Link copied

Cite

Brandon Theodorou; Jimeng Sun; Cao Xiao (2023). HALO Synthetic Data [Dataset]. http://doi.org/10.6084/m9.figshare.23811162.v3

HALO Synthetic Data

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.23811162.v3

Dataset updated

Aug 7, 2023

Dataset provided by

Figsharehttp://figshare.com/
figshare

Authors

Brandon Theodorou; Jimeng Sun; Cao Xiao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Synthetic electronic health record data accompanying the paper "Synthesizing High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Mode"

Clear search

Close search

Google apps

Main menu

HALO Synthetic Data

Data archive for paper "Copula-based synthetic data augmentation for...

Data from: A Large-scale Synthetic Pathological Dataset for Deep...

Dataset for publication: Usefulness of synthetic datasets for diatom...

DataSheet1_Synthetic data at scale: a development model to efficiently...

Dataset of synthetic whole-body bone scintigraphy scans representing three...

SYNTHETIC dataset attached to the paper "Grasp Pre-shape Selection by...

Synthea synthetic patient data for stroke risk prediction machine learning

Synthea synthetic patient data for lung cancer risk prediction machine...

Parcel3D - A Synthetic Dataset of Damaged and Intact Parcel Images with 2D...

Bearings with Varying Degradation Behaviors

Synthetic Rock Paper Scissors Dataset

Digital twin-supported deep learning for fault diagnosis Dataset

SynthCity Dataset - All Areas

Synthetic COVID-19

Data from: Survey2Survey: A deep learning generative model approach for...

Network Digital Twin-Generated Dataset for Machine Learning-Based Detection...

A dataset of published journal papers using neural networks for...

(capsicum) deepNIR: Dataset for generating synthetic NIR images

AraucanaXAI - HEPAR synthetic datasets

HALO Synthetic Data