71 datasets found

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of...
zenodo.org
bin
Updated Jun 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie (2021). pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs [Dataset]. http://doi.org/10.5281/zenodo.5031881
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5031881
Dataset updated
Jun 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Dataset specification:

MRI images of Vertebral Units labelled based on region

Dataset is comprised of 10000 pairs of images and labels

Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]

Images are 3D of size (9, 64, 64)

Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/

Abstract:

Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
s
Data from: Synthetic brain tumor images from GANs and diffusion models
datahub.aida.scilifelab.se
researchdata.se
+1more
Updated Oct 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Usman Akbar; Anders Eklund (2023). Synthetic brain tumor images from GANs and diffusion models [Dataset]. http://doi.org/10.23698/aida/synthetic/brgandi
Explore at:
Unique identifier
https://doi.org/10.23698/aida/synthetic/brgandi
Dataset updated
Oct 18, 2023
Dataset provided by
AIDA
Linköping University
AIDA Data Hub
Authors
Muhammad Usman Akbar; Anders Eklund
Description
This dataset is a collection of synthetic images generated by 5 generative models (Progressive GAN, StyleGAN1, StyleGAN2, StyleGAN3, diffusion model) trained on the BraTS 2020 and 2021 datasets 1,2,3,4,5. The trained generative models are also shared in this dataset. See our recent work [6] for more information, and a comparison of training segmentation networks with real and synthetic images.
f
Model summary for the vanilla GAN model.
figshare.com
xls
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo (2023). Model summary for the vanilla GAN model. [Dataset]. http://doi.org/10.1371/journal.pone.0260308.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0260308.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Model summary for the vanilla GAN model.
Z
Synthetic building performance data
data.niaid.nih.gov
Updated Jul 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khayatian, Fazel (2021). Synthetic building performance data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4696059
Explore at:
Dataset updated
Jul 5, 2021
Dataset authored and provided by
Khayatian, Fazel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset of uncertainty-infused synthetic building performance profiles as hallucinated by a generative adversarial network (GAN). This dataset includes both the original building performance data, as well as 50 synthetic performance scenarios that are generated by a GAN. Further details on the method and codes can be found here: https://github.com/Khayatian/seer
i
IIITDMJ_Maize
ieee-dataport.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poornima Thakur, IIITDMJ_Maize [Dataset]. https://ieee-dataport.org/documents/iiitdmjmaize
Explore at:
Authors
Poornima Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
including both sunny and cloudy days.
h
Synthesis of CT images from digital body phantoms using CycleGAN [dataset]
heidata.uni-heidelberg.de
zip
Updated Feb 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Frank Zöllner; Frank Zöllner (2023). Synthesis of CT images from digital body phantoms using CycleGAN [dataset] [Dataset]. http://doi.org/10.11588/DATA/7NRFYC
Explore at:
zip(53512131857)Available download formats
Unique identifier
https://doi.org/10.11588/DATA/7NRFYC
Dataset updated
Feb 23, 2023
Dataset provided by
heiDATA
Authors
Frank Zöllner; Frank Zöllner
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset funded by
German Federal Ministry of Education and Research (BMBF)
Description
The potential of medical image analysis with neural networks is limited by the restricted availability of extensive data sets. The incorporation of synthetic training data is one approach to bypass this shortcoming, as synthetic data offer accurate annotations and unlimited data size. We evaluated eleven CycleGAN for the synthesis of computed tomography (CT) images based on XCAT body phantoms.
f
Minimum Euclidean distance between real and synthetic data generated by...
plos.figshare.com
xls
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raffaele Marchesi; Nicolo Micheletti; Nicholas I-Hsien Kuo; Sebastiano Barbieri; Giuseppe Jurman; Venet Osmani (2025). Minimum Euclidean distance between real and synthetic data generated by SMOTE, WGAN-GP* and CA-GAN. No method generates exact copies of the real data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013080.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1013080.t002
Dataset updated
May 27, 2025
Dataset provided by
PLOS Computational Biology
Authors
Raffaele Marchesi; Nicolo Micheletti; Nicholas I-Hsien Kuo; Sebastiano Barbieri; Giuseppe Jurman; Venet Osmani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Minimum Euclidean distance between real and synthetic data generated by SMOTE, WGAN-GP* and CA-GAN. No method generates exact copies of the real data.
m
Synthetic Dental Orthopantomography (OPG) Data Generated by GAN Models
data.mendeley.com
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Waqas (2025). Synthetic Dental Orthopantomography (OPG) Data Generated by GAN Models [Dataset]. http://doi.org/10.17632/y35z46ccw6.1
Explore at:
Unique identifier
https://doi.org/10.17632/y35z46ccw6.1
Dataset updated
Jun 20, 2025
Authors
Maria Waqas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Synthetic OPG Image Dataset comprises high-resolution, anatomically realistic dental X-ray images generated using custom-trained GAN variants. Based on a diverse clinical dataset from Pakistan, Thailand, and the U.S., this collection includes over 1200 curated synthetic images designed to augment training data for deep learning models in dental imaging.
f
Topology of the WGANs considered in the analysis of the hyperparameters...
plos.figshare.com
xls
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo (2023). Topology of the WGANs considered in the analysis of the hyperparameters (number of layers and number of neurons). [Dataset]. http://doi.org/10.1371/journal.pone.0260308.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0260308.t005
Dataset updated
Jun 8, 2023
Dataset provided by
PLOS ONE
Authors
Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For each topology, the table specifies the main change with respect to the GAN considered in Section 5.2, as well as the number of neurons in each layer. G stands for generator, and D stands for discriminator. The hidden layers are indicated as H1, H2, and H3. The input layer is denoted as In and the output layer as Out.
R
AI in Generative Adversarial Networks Market Market Research Report 2033
researchintelo.com
csv, pdf, pptx
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Intelo (2025). AI in Generative Adversarial Networks Market Market Research Report 2033 [Dataset]. https://researchintelo.com/report/ai-in-generative-adversarial-networks-market-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jul 24, 2025
Dataset authored and provided by
Research Intelo
License
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
Time period covered
2024 - 2033
Area covered
Global
Description
AI in Generative Adversarial Networks (GANs) Market Outlook

According to our latest research, the global AI in Generative Adversarial Networks (GANs) market size reached USD 2.65 billion in 2024, reflecting robust growth driven by rapid advancements in deep learning and artificial intelligence. The market is expected to register a remarkable CAGR of 31.4% from 2025 to 2033, accelerating the adoption of GANs across diverse industries. By 2033, the market is forecasted to achieve a value of USD 32.78 billion, underscoring the transformative impact of GANs in areas such as image and video generation, data augmentation, and synthetic content creation. This trajectory is supported by the increasing demand for highly realistic synthetic data and the expansion of AI-driven applications across enterprise and consumer domains.

A primary growth factor for the AI in Generative Adversarial Networks market is the exponential increase in the availability and complexity of data that organizations must process. GANs, with their unique adversarial training methodology, have proven exceptionally effective for generating realistic synthetic data, which is crucial for industries like healthcare, automotive, and finance where data privacy and scarcity are significant concerns. The ability of GANs to create high-fidelity images, videos, and even text has enabled organizations to enhance their AI models, improve data diversity, and reduce bias, thereby accelerating the adoption of AI-driven solutions. Furthermore, the integration of GANs with cloud-based platforms and the proliferation of open-source GAN frameworks have democratized access to this technology, enabling both large enterprises and SMEs to harness its potential for innovative applications.

Another significant driver for the AI in Generative Adversarial Networks market is the surge in demand for advanced content creation tools in media, entertainment, and marketing. GANs have revolutionized the way digital content is produced by enabling hyper-realistic image and video synthesis, deepfake generation, and automated design. This has not only streamlined creative workflows but also opened new avenues for personalized content, virtual influencers, and immersive experiences in gaming and advertising. The rapid evolution of GAN architectures, such as StyleGAN and CycleGAN, has further enhanced the quality and scalability of generative models, making them indispensable for enterprises seeking to differentiate their digital offerings and engage customers more effectively in a highly competitive landscape.

The ongoing advancements in hardware acceleration and AI infrastructure have also played a pivotal role in propelling the AI in Generative Adversarial Networks market forward. The availability of powerful GPUs, TPUs, and AI-specific chips has significantly reduced the training time and computational costs associated with GANs, making them more accessible for real-time and large-scale applications. Additionally, the growing ecosystem of AI services and consulting has enabled organizations to overcome technical barriers, optimize GAN deployments, and ensure compliance with evolving regulatory standards. As investment in AI research continues to surge, the GANs market is poised for sustained innovation and broader adoption across sectors such as healthcare diagnostics, autonomous vehicles, financial modeling, and beyond.

From a regional perspective, North America continues to dominate the AI in Generative Adversarial Networks market, accounting for the largest share in 2024, driven by its robust R&D ecosystem, strong presence of leading technology companies, and early adoption of AI technologies. Europe follows closely, with significant investments in AI research and regulatory initiatives promoting ethical AI development. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digital transformation, expanding AI talent pool, and increasing government support for AI innovation. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as enterprises in these regions begin to explore the potential of GANs for industry-specific applications.

Component Analysis

The AI in Generative Adversarial Networks market is segmented by component into software, hardware, and services, each playing a vital role in the ecosystem’s development and adoption. Software solutions constitute the largest share of the market in 2024, reflecting the growing demand for ad
i
GAN based synthesized audio dataset
ieee-dataport.org
Updated May 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenyu Zhang (2020). GAN based synthesized audio dataset [Dataset]. https://ieee-dataport.org/documents/gan-based-synthesized-audio-dataset
Explore at:
Dataset updated
May 11, 2020
Authors
Zhenyu Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TifGAN
h
Synthetic dataset of hospitalised patients with an acute exacerbation of...
healthdatagateway.org
unknown
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic dataset of hospitalised patients with an acute exacerbation of asthma [Dataset]. https://healthdatagateway.org/dataset/1015
Explore at:
unknownAvailable download formats
Dataset updated
Dec 17, 2024
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
To support respiratory research, a synthetic asthma dataset was generated based on a real-world data, originally documenting 381 patients with physician-confirmed asthma who were admitted to secondary care at a single centre in 2019. The dataset is highly detailed, covering demographics, structured physiological data, medication records, and clinical outcomes. The synthetic version extends to 561 patients admitted over a year, offering insights into patient patterns, risk factors, and treatment strategies.

The dataset was created using the Synthetic Data Vault package, specifically employing the GAN synthesizer. Real data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

Available supplementary data: Real world data. Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real-world data upon request.

Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
GAN-Synthesized Augmented Radiology Dataset Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). GAN-Synthesized Augmented Radiology Dataset Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/gan-synthesized-augmented-radiology-dataset-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
GAN-Synthesized Augmented Radiology Dataset Market Outlook

According to our latest research, the GAN-Synthesized Augmented Radiology Dataset market size reached USD 412 million in 2024, supported by a robust surge in the adoption of artificial intelligence across healthcare imaging. The market demonstrated a strong CAGR of 25.7% from 2021 to 2024 and is on track to reach a valuation of USD 3.2 billion by 2033. The primary growth factor fueling this expansion is the increasing demand for high-quality, diverse, and annotated radiology datasets to train and validate advanced AI diagnostic models, especially as regulatory requirements for clinical validation intensify globally.

The exponential growth of the GAN-Synthesized Augmented Radiology Dataset market is being driven by the urgent need for large-scale, diverse, and unbiased datasets in medical imaging. Traditional methods of acquiring and annotating radiological images are time-consuming, expensive, and often limited by patient privacy concerns. Generative Adversarial Networks (GANs) have emerged as a transformative technology, enabling the synthesis of high-fidelity, realistic medical images that can augment existing datasets. This not only enhances the statistical power and generalizability of AI models but also helps overcome the challenge of data imbalance, especially for rare diseases and underrepresented demographic groups. As AI-driven diagnostics become integral to clinical workflows, the reliance on GAN-augmented datasets is expected to intensify, further propelling market growth.

Another significant growth driver is the increasing collaboration between radiology departments, AI technology vendors, and academic research institutes. These partnerships are focused on developing standardized protocols for dataset generation, annotation, and validation, leveraging GANs to create synthetic images that closely mimic real-world clinical scenarios. The resulting datasets facilitate the training of AI algorithms for a wide array of applications, including disease detection, anomaly identification, and image segmentation. Additionally, the proliferation of cloud-based platforms and open-source AI frameworks has democratized access to GAN-synthesized datasets, enabling even smaller healthcare organizations and startups to participate in the AI-driven transformation of radiology.

The regulatory landscape is also evolving to support the responsible use of synthetic data in healthcare. Regulatory agencies in North America, Europe, and Asia Pacific are increasingly recognizing the value of GAN-generated datasets for algorithm validation, provided they meet stringent standards for data quality, privacy, and clinical relevance. This regulatory endorsement is encouraging more hospitals, diagnostic centers, and research institutions to adopt GAN-augmented datasets, further accelerating market expansion. Moreover, the ongoing advancements in GAN architectures, such as StyleGAN and CycleGAN, are enhancing the realism and diversity of synthesized images, making them virtually indistinguishable from real patient scans and boosting their acceptance in both clinical and research settings.

From a regional perspective, North America is currently the largest market for GAN-Synthesized Augmented Radiology Datasets, driven by substantial investments in healthcare AI, the presence of leading technology vendors, and proactive regulatory support. Europe follows closely, with a strong emphasis on data privacy and cross-border research collaborations. The Asia Pacific region is witnessing the fastest growth, fueled by rapid digital transformation in healthcare, rising investments in AI infrastructure, and increasing disease burden. Latin America and the Middle East & Africa are also emerging as promising markets, albeit at a slower pace, as healthcare systems in these regions begin to adopt AI-driven radiology solutions.

Dataset Type Analysis

The dataset type segment of the GAN-Synthesized Augmented Radiology Dataset market is pi
Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...
zenodo.org
data.niaid.nih.gov
xz
Updated Oct 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
Explore at:
xzAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7065064
Dataset updated
Oct 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.
f
Model summary for the WGAN model.
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo (2023). Model summary for the WGAN model. [Dataset]. http://doi.org/10.1371/journal.pone.0260308.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0260308.t003
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Model summary for the WGAN model.
Intrusion_detection_dataset
kaggle.com
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamza Farooq (2023). Intrusion_detection_dataset [Dataset]. https://www.kaggle.com/datasets/ameerhamza123/intrusion-detection-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hamza Farooq
Description
This dataset contains network traffic data collected from a computer network. The network consists of various devices, such as computers, servers, and routers, interconnected to facilitate communication and data exchange. The dataset captures different types of network activities, including normal network traffic as well as various network anomalies and attacks. It provides a comprehensive view of the network behavior and can be used for studying network security, intrusion detection, and anomaly detection algorithms. The dataset includes features such as source and destination IP addresses, port numbers, protocol types, packet sizes, and timestamps, enabling detailed analysis of network traffic patterns and characteristics and so on... The second file in this dataset contains synthetic data that has been generated using a Generative Adversarial Network (GAN). GANs are a type of deep learning model that can learn the underlying patterns and distributions of a given dataset and generate new synthetic samples that resemble the original data. In this case, the GAN has been trained on the network traffic data from the first file to learn the characteristics and structure of the network traffic. The generated synthetic data in the second file aims to mimic the patterns and behavior observed in real network traffic. This synthetic data can be used for various purposes, such as augmenting the original dataset, testing the robustness of machine learning models, or exploring different scenarios in network analysis.
Face Dataset Of People That Don't Exist
kaggle.com
Updated Sep 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2023). Face Dataset Of People That Don't Exist [Dataset]. http://doi.org/10.34740/kaggle/dsv/6433550
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6433550
Dataset updated
Sep 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

All the images of faces here are generated using https://thispersondoesnotexist.com/

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4c3d3569f4f9c12fc898d76390f68dab%2FBeFunky-collage.jpg?generation=1662079836729388&alt=media" alt="">

Copyrighting of AI Generated images

Under US copyright law, these images are technically not subject to copyright protection. Only "original works of authorship" are considered. "To qualify as a work of 'authorship' a work must be created by a human being," according to a US Copyright Office's report [PDF].

https://www.theregister.com/2022/08/14/ai_digital_artwork_copyright/

Tagging

I manually tagged all images as best as I could and separated them between the two classes below

Female- 3860 images

Male- 3013 images

Some may pass either female or male, but I will leave it to you to do the reviewing. I included toddlers and babies under Male/ Female

How it works

Each of the faces are totally fake, created using an algorithm called Generative Adversarial Networks (GANs).

A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss).

Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning,and reinforcement learning.

https://www.youtube.com/watch?v=u8qPvzk0AfY

https://www.youtube.com/watch?v=dCKbRCUyop8

https://www.youtube.com/watch?v=SWoravHhsUU

Github implementation of website

https://github.com/NVlabs/stylegan2

https://github.com/lucidrains/stylegan2-pytorch

https://github.com/lucidrains/lightweight-gan

How I gathered the images

Just a simple Jupyter notebook that looped and invoked the website https://thispersondoesnotexist.com/ , saving all images locally
h
PRLx-GAN-synthetic-rim
huggingface.co
Updated Jul 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra G. Roberts (2025). PRLx-GAN-synthetic-rim [Dataset]. https://huggingface.co/datasets/agr78/PRLx-GAN-synthetic-rim
Explore at:
Dataset updated
Jul 30, 2025
Authors
Alexandra G. Roberts
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
PRLx-GAN

Repository for Synthetic Generation and Latent Projection Denoising of Rim Lesions in Multiple Sclerosis published in Synthetic Data at CVPR 2025.

Summary

Paramagnetic rim lesions (PRLs) are a rare but highly prognostic lesion subtype in multiple sclerosis, visible only on susceptibility ($\chi$) contrasts. This work presents a generative framework to:

Synthesize new rim lesion maps that address class imbalance in training data Enable a novel denoising… See the full description on the dataset page: https://huggingface.co/datasets/agr78/PRLx-GAN-synthetic-rim.
h
Synthetic Dataset of Hospital Admissions for an acute Stroke
healthdatagateway.org
unknown
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic Dataset of Hospital Admissions for an acute Stroke [Dataset]. https://healthdatagateway.org/en/dataset/1003
Explore at:
unknownAvailable download formats
Dataset updated
Dec 4, 2024
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
Strokes can be ischaemic or haemorrhagic in nature, leading to debilitating symptoms which are dependent on the location of the stroke in the brain and the severity of the insult. Stroke care is centred around Hyper-acute Stroke Units (HASU), Acute Stroke and Brain Injury Units (ASU/ABIU) and specialist stroke services. Early presentation enables the use of more invasive treatments to clear blood clots, but commonly strokes present late, preventing their use.

This synthetic dataset represents approximately 29,000 stroke patients. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.

The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. Real. data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute stroke services & specialist care across four hospital sites.

Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.

Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
h
Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme...
healthdatagateway.org
unknown
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme resource [Dataset]. https://healthdatagateway.org/dataset/189
Explore at:
unknownAvailable download formats
Dataset updated
Oct 8, 2024
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
This highly granular synthetic dataset created as an asset for the HDR UK Medicines programme includes information on 680 cancer patients over a period of three years. Includes simulated patient-related data, such as demographics & co-morbidities extracted from ICD-10 and SNOMED-CT codes. Serial, structured data pertaining to acute care process (readmissions, survival), primary diagnosis, presenting complaint, physiology readings, blood results (infection, inflammatory markers) and acuity markers such as AVPU Scale, NEWS2 score, imaging reports, prescribed & administered treatments including fluids, blood products, procedures, information on outpatient admissions and survival outcomes following one-year post discharge.

The data was generated using a generative adversarial network model (CTGAN). A flat real data table was created by consolidating essential information from various key relational tables (medications, demographics). A synthetic version of the flat table was generated using a customized script based on the SDV package (N. Patki, 2016), that replicated the real distribution and logic relationships.

Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and provide the real-data via application.

Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie (2021). pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs [Dataset]. http://doi.org/10.5281/zenodo.5031881

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Explore at:

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5031881

Dataset updated

Jun 26, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Dataset specification:

MRI images of Vertebral Units labelled based on region
Dataset is comprised of 10000 pairs of images and labels
Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]
Images are 3D of size (9, 64, 64)
Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/

Abstract:

Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.

Clear search

Close search

Google apps

Main menu

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of...

Data from: Synthetic brain tumor images from GANs and diffusion models

Model summary for the vanilla GAN model.

Synthetic building performance data

IIITDMJ_Maize

Synthesis of CT images from digital body phantoms using CycleGAN [dataset]

Minimum Euclidean distance between real and synthetic data generated by...

Synthetic Dental Orthopantomography (OPG) Data Generated by GAN Models

Topology of the WGANs considered in the analysis of the hyperparameters...

AI in Generative Adversarial Networks Market Market Research Report 2033

AI in Generative Adversarial Networks (GANs) Market Outlook

Component Analysis

GAN based synthesized audio dataset

Synthetic dataset of hospitalised patients with an acute exacerbation of...

GAN-Synthesized Augmented Radiology Dataset Market Research Report 2033

GAN-Synthesized Augmented Radiology Dataset Market Outlook

Dataset Type Analysis

Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

Model summary for the WGAN model.

Intrusion_detection_dataset

Face Dataset Of People That Don't Exist

Context

Copyrighting of AI Generated images

Tagging

How it works

Github implementation of website

How I gathered the images

PRLx-GAN-synthetic-rim

Synthetic Dataset of Hospital Admissions for an acute Stroke

Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme...

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs