71 datasets found
  1. pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of...

    • zenodo.org
    bin
    Updated Jun 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie (2021). pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs [Dataset]. http://doi.org/10.5281/zenodo.5031881
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 26, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

    Dataset specification:

    • MRI images of Vertebral Units labelled based on region
    • Dataset is comprised of 10000 pairs of images and labels
    • Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]
    • Images are 3D of size (9, 64, 64)
    • Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

    Arxiv paper: https://arxiv.org/abs/2106.13199
    Github code: https://github.com/tcoroller/pGAN/

    Abstract:

    Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.

  2. s

    Data from: Synthetic brain tumor images from GANs and diffusion models

    • datahub.aida.scilifelab.se
    • researchdata.se
    • +1more
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Usman Akbar; Anders Eklund (2023). Synthetic brain tumor images from GANs and diffusion models [Dataset]. http://doi.org/10.23698/aida/synthetic/brgandi
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    AIDA
    Linköping University
    AIDA Data Hub
    Authors
    Muhammad Usman Akbar; Anders Eklund
    Description

    This dataset is a collection of synthetic images generated by 5 generative models (Progressive GAN, StyleGAN1, StyleGAN2, StyleGAN3, diffusion model) trained on the BraTS 2020 and 2021 datasets 1,2,3,4,5. The trained generative models are also shared in this dataset. See our recent work [6] for more information, and a comparison of training segmentation networks with real and synthetic images.

  3. f

    Model summary for the vanilla GAN model.

    • figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo (2023). Model summary for the vanilla GAN model. [Dataset]. http://doi.org/10.1371/journal.pone.0260308.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model summary for the vanilla GAN model.

  4. Z

    Synthetic building performance data

    • data.niaid.nih.gov
    Updated Jul 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khayatian, Fazel (2021). Synthetic building performance data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4696059
    Explore at:
    Dataset updated
    Jul 5, 2021
    Dataset authored and provided by
    Khayatian, Fazel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A dataset of uncertainty-infused synthetic building performance profiles as hallucinated by a generative adversarial network (GAN). This dataset includes both the original building performance data, as well as 50 synthetic performance scenarios that are generated by a GAN. Further details on the method and codes can be found here: https://github.com/Khayatian/seer

  5. i

    IIITDMJ_Maize

    • ieee-dataport.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poornima Thakur, IIITDMJ_Maize [Dataset]. https://ieee-dataport.org/documents/iiitdmjmaize
    Explore at:
    Authors
    Poornima Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    including both sunny and cloudy days.

  6. h

    Synthesis of CT images from digital body phantoms using CycleGAN [dataset]

    • heidata.uni-heidelberg.de
    zip
    Updated Feb 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Zöllner; Frank Zöllner (2023). Synthesis of CT images from digital body phantoms using CycleGAN [dataset] [Dataset]. http://doi.org/10.11588/DATA/7NRFYC
    Explore at:
    zip(53512131857)Available download formats
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    heiDATA
    Authors
    Frank Zöllner; Frank Zöllner
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Dataset funded by
    German Federal Ministry of Education and Research (BMBF)
    Description

    The potential of medical image analysis with neural networks is limited by the restricted availability of extensive data sets. The incorporation of synthetic training data is one approach to bypass this shortcoming, as synthetic data offer accurate annotations and unlimited data size. We evaluated eleven CycleGAN for the synthesis of computed tomography (CT) images based on XCAT body phantoms.

  7. f

    Minimum Euclidean distance between real and synthetic data generated by...

    • plos.figshare.com
    xls
    Updated May 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaele Marchesi; Nicolo Micheletti; Nicholas I-Hsien Kuo; Sebastiano Barbieri; Giuseppe Jurman; Venet Osmani (2025). Minimum Euclidean distance between real and synthetic data generated by SMOTE, WGAN-GP* and CA-GAN. No method generates exact copies of the real data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1013080.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 27, 2025
    Dataset provided by
    PLOS Computational Biology
    Authors
    Raffaele Marchesi; Nicolo Micheletti; Nicholas I-Hsien Kuo; Sebastiano Barbieri; Giuseppe Jurman; Venet Osmani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Minimum Euclidean distance between real and synthetic data generated by SMOTE, WGAN-GP* and CA-GAN. No method generates exact copies of the real data.

  8. m

    Synthetic Dental Orthopantomography (OPG) Data Generated by GAN Models

    • data.mendeley.com
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Waqas (2025). Synthetic Dental Orthopantomography (OPG) Data Generated by GAN Models [Dataset]. http://doi.org/10.17632/y35z46ccw6.1
    Explore at:
    Dataset updated
    Jun 20, 2025
    Authors
    Maria Waqas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Synthetic OPG Image Dataset comprises high-resolution, anatomically realistic dental X-ray images generated using custom-trained GAN variants. Based on a diverse clinical dataset from Pakistan, Thailand, and the U.S., this collection includes over 1200 curated synthetic images designed to augment training data for deep learning models in dental imaging.

  9. f

    Topology of the WGANs considered in the analysis of the hyperparameters...

    • plos.figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo (2023). Topology of the WGANs considered in the analysis of the hyperparameters (number of layers and number of neurons). [Dataset]. http://doi.org/10.1371/journal.pone.0260308.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For each topology, the table specifies the main change with respect to the GAN considered in Section 5.2, as well as the number of neurons in each layer. G stands for generator, and D stands for discriminator. The hidden layers are indicated as H1, H2, and H3. The input layer is denoted as In and the output layer as Out.

  10. R

    AI in Generative Adversarial Networks Market Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). AI in Generative Adversarial Networks Market Market Research Report 2033 [Dataset]. https://researchintelo.com/report/ai-in-generative-adversarial-networks-market-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    AI in Generative Adversarial Networks (GANs) Market Outlook



    According to our latest research, the global AI in Generative Adversarial Networks (GANs) market size reached USD 2.65 billion in 2024, reflecting robust growth driven by rapid advancements in deep learning and artificial intelligence. The market is expected to register a remarkable CAGR of 31.4% from 2025 to 2033, accelerating the adoption of GANs across diverse industries. By 2033, the market is forecasted to achieve a value of USD 32.78 billion, underscoring the transformative impact of GANs in areas such as image and video generation, data augmentation, and synthetic content creation. This trajectory is supported by the increasing demand for highly realistic synthetic data and the expansion of AI-driven applications across enterprise and consumer domains.



    A primary growth factor for the AI in Generative Adversarial Networks market is the exponential increase in the availability and complexity of data that organizations must process. GANs, with their unique adversarial training methodology, have proven exceptionally effective for generating realistic synthetic data, which is crucial for industries like healthcare, automotive, and finance where data privacy and scarcity are significant concerns. The ability of GANs to create high-fidelity images, videos, and even text has enabled organizations to enhance their AI models, improve data diversity, and reduce bias, thereby accelerating the adoption of AI-driven solutions. Furthermore, the integration of GANs with cloud-based platforms and the proliferation of open-source GAN frameworks have democratized access to this technology, enabling both large enterprises and SMEs to harness its potential for innovative applications.



    Another significant driver for the AI in Generative Adversarial Networks market is the surge in demand for advanced content creation tools in media, entertainment, and marketing. GANs have revolutionized the way digital content is produced by enabling hyper-realistic image and video synthesis, deepfake generation, and automated design. This has not only streamlined creative workflows but also opened new avenues for personalized content, virtual influencers, and immersive experiences in gaming and advertising. The rapid evolution of GAN architectures, such as StyleGAN and CycleGAN, has further enhanced the quality and scalability of generative models, making them indispensable for enterprises seeking to differentiate their digital offerings and engage customers more effectively in a highly competitive landscape.



    The ongoing advancements in hardware acceleration and AI infrastructure have also played a pivotal role in propelling the AI in Generative Adversarial Networks market forward. The availability of powerful GPUs, TPUs, and AI-specific chips has significantly reduced the training time and computational costs associated with GANs, making them more accessible for real-time and large-scale applications. Additionally, the growing ecosystem of AI services and consulting has enabled organizations to overcome technical barriers, optimize GAN deployments, and ensure compliance with evolving regulatory standards. As investment in AI research continues to surge, the GANs market is poised for sustained innovation and broader adoption across sectors such as healthcare diagnostics, autonomous vehicles, financial modeling, and beyond.



    From a regional perspective, North America continues to dominate the AI in Generative Adversarial Networks market, accounting for the largest share in 2024, driven by its robust R&D ecosystem, strong presence of leading technology companies, and early adoption of AI technologies. Europe follows closely, with significant investments in AI research and regulatory initiatives promoting ethical AI development. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digital transformation, expanding AI talent pool, and increasing government support for AI innovation. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as enterprises in these regions begin to explore the potential of GANs for industry-specific applications.



    Component Analysis



    The AI in Generative Adversarial Networks market is segmented by component into software, hardware, and services, each playing a vital role in the ecosystem’s development and adoption. Software solutions constitute the largest share of the market in 2024, reflecting the growing demand for ad

  11. i

    GAN based synthesized audio dataset

    • ieee-dataport.org
    Updated May 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenyu Zhang (2020). GAN based synthesized audio dataset [Dataset]. https://ieee-dataport.org/documents/gan-based-synthesized-audio-dataset
    Explore at:
    Dataset updated
    May 11, 2020
    Authors
    Zhenyu Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TifGAN

  12. h

    Synthetic dataset of hospitalised patients with an acute exacerbation of...

    • healthdatagateway.org
    unknown
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic dataset of hospitalised patients with an acute exacerbation of asthma [Dataset]. https://healthdatagateway.org/dataset/1015
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Dec 17, 2024
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    To support respiratory research, a synthetic asthma dataset was generated based on a real-world data, originally documenting 381 patients with physician-confirmed asthma who were admitted to secondary care at a single centre in 2019. The dataset is highly detailed, covering demographics, structured physiological data, medication records, and clinical outcomes. The synthetic version extends to 561 patients admitted over a year, offering insights into patient patterns, risk factors, and treatment strategies.

    The dataset was created using the Synthetic Data Vault package, specifically employing the GAN synthesizer. Real data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

    Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Real world data. Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real-world data upon request.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  13. GAN-Synthesized Augmented Radiology Dataset Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). GAN-Synthesized Augmented Radiology Dataset Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/gan-synthesized-augmented-radiology-dataset-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    GAN-Synthesized Augmented Radiology Dataset Market Outlook



    According to our latest research, the GAN-Synthesized Augmented Radiology Dataset market size reached USD 412 million in 2024, supported by a robust surge in the adoption of artificial intelligence across healthcare imaging. The market demonstrated a strong CAGR of 25.7% from 2021 to 2024 and is on track to reach a valuation of USD 3.2 billion by 2033. The primary growth factor fueling this expansion is the increasing demand for high-quality, diverse, and annotated radiology datasets to train and validate advanced AI diagnostic models, especially as regulatory requirements for clinical validation intensify globally.




    The exponential growth of the GAN-Synthesized Augmented Radiology Dataset market is being driven by the urgent need for large-scale, diverse, and unbiased datasets in medical imaging. Traditional methods of acquiring and annotating radiological images are time-consuming, expensive, and often limited by patient privacy concerns. Generative Adversarial Networks (GANs) have emerged as a transformative technology, enabling the synthesis of high-fidelity, realistic medical images that can augment existing datasets. This not only enhances the statistical power and generalizability of AI models but also helps overcome the challenge of data imbalance, especially for rare diseases and underrepresented demographic groups. As AI-driven diagnostics become integral to clinical workflows, the reliance on GAN-augmented datasets is expected to intensify, further propelling market growth.




    Another significant growth driver is the increasing collaboration between radiology departments, AI technology vendors, and academic research institutes. These partnerships are focused on developing standardized protocols for dataset generation, annotation, and validation, leveraging GANs to create synthetic images that closely mimic real-world clinical scenarios. The resulting datasets facilitate the training of AI algorithms for a wide array of applications, including disease detection, anomaly identification, and image segmentation. Additionally, the proliferation of cloud-based platforms and open-source AI frameworks has democratized access to GAN-synthesized datasets, enabling even smaller healthcare organizations and startups to participate in the AI-driven transformation of radiology.




    The regulatory landscape is also evolving to support the responsible use of synthetic data in healthcare. Regulatory agencies in North America, Europe, and Asia Pacific are increasingly recognizing the value of GAN-generated datasets for algorithm validation, provided they meet stringent standards for data quality, privacy, and clinical relevance. This regulatory endorsement is encouraging more hospitals, diagnostic centers, and research institutions to adopt GAN-augmented datasets, further accelerating market expansion. Moreover, the ongoing advancements in GAN architectures, such as StyleGAN and CycleGAN, are enhancing the realism and diversity of synthesized images, making them virtually indistinguishable from real patient scans and boosting their acceptance in both clinical and research settings.




    From a regional perspective, North America is currently the largest market for GAN-Synthesized Augmented Radiology Datasets, driven by substantial investments in healthcare AI, the presence of leading technology vendors, and proactive regulatory support. Europe follows closely, with a strong emphasis on data privacy and cross-border research collaborations. The Asia Pacific region is witnessing the fastest growth, fueled by rapid digital transformation in healthcare, rising investments in AI infrastructure, and increasing disease burden. Latin America and the Middle East & Africa are also emerging as promising markets, albeit at a slower pace, as healthcare systems in these regions begin to adopt AI-driven radiology solutions.





    Dataset Type Analysis



    The dataset type segment of the GAN-Synthesized Augmented Radiology Dataset market is pi

  14. Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

    • zenodo.org
    • data.niaid.nih.gov
    xz
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
    Explore at:
    xzAvailable download formats
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

    Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

  15. f

    Model summary for the WGAN model.

    • figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo (2023). Model summary for the WGAN model. [Dataset]. http://doi.org/10.1371/journal.pone.0260308.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Mauro Castelli; Luca Manzoni; Tatiane Espindola; Aleš Popovič; Andrea De Lorenzo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Model summary for the WGAN model.

  16. Intrusion_detection_dataset

    • kaggle.com
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Farooq (2023). Intrusion_detection_dataset [Dataset]. https://www.kaggle.com/datasets/ameerhamza123/intrusion-detection-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hamza Farooq
    Description

    This dataset contains network traffic data collected from a computer network. The network consists of various devices, such as computers, servers, and routers, interconnected to facilitate communication and data exchange. The dataset captures different types of network activities, including normal network traffic as well as various network anomalies and attacks. It provides a comprehensive view of the network behavior and can be used for studying network security, intrusion detection, and anomaly detection algorithms. The dataset includes features such as source and destination IP addresses, port numbers, protocol types, packet sizes, and timestamps, enabling detailed analysis of network traffic patterns and characteristics and so on... The second file in this dataset contains synthetic data that has been generated using a Generative Adversarial Network (GAN). GANs are a type of deep learning model that can learn the underlying patterns and distributions of a given dataset and generate new synthetic samples that resemble the original data. In this case, the GAN has been trained on the network traffic data from the first file to learn the characteristics and structure of the network traffic. The generated synthetic data in the second file aims to mimic the patterns and behavior observed in real network traffic. This synthetic data can be used for various purposes, such as augmenting the original dataset, testing the robustness of machine learning models, or exploring different scenarios in network analysis.

  17. Face Dataset Of People That Don't Exist

    • kaggle.com
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2023). Face Dataset Of People That Don't Exist [Dataset]. http://doi.org/10.34740/kaggle/dsv/6433550
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    All the images of faces here are generated using https://thispersondoesnotexist.com/

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4c3d3569f4f9c12fc898d76390f68dab%2FBeFunky-collage.jpg?generation=1662079836729388&alt=media" alt="">

    Copyrighting of AI Generated images

    Under US copyright law, these images are technically not subject to copyright protection. Only "original works of authorship" are considered. "To qualify as a work of 'authorship' a work must be created by a human being," according to a US Copyright Office's report [PDF].

    https://www.theregister.com/2022/08/14/ai_digital_artwork_copyright/

    Tagging

    I manually tagged all images as best as I could and separated them between the two classes below

    • Female- 3860 images
    • Male- 3013 images

    Some may pass either female or male, but I will leave it to you to do the reviewing. I included toddlers and babies under Male/ Female

    How it works

    Each of the faces are totally fake, created using an algorithm called Generative Adversarial Networks (GANs).

    A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss).

    Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning,and reinforcement learning.

    Github implementation of website

    How I gathered the images

    Just a simple Jupyter notebook that looped and invoked the website https://thispersondoesnotexist.com/ , saving all images locally

  18. h

    PRLx-GAN-synthetic-rim

    • huggingface.co
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra G. Roberts (2025). PRLx-GAN-synthetic-rim [Dataset]. https://huggingface.co/datasets/agr78/PRLx-GAN-synthetic-rim
    Explore at:
    Dataset updated
    Jul 30, 2025
    Authors
    Alexandra G. Roberts
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    PRLx-GAN

    Repository for Synthetic Generation and Latent Projection Denoising of Rim Lesions in Multiple Sclerosis published in Synthetic Data at CVPR 2025.

      Summary
    

    Paramagnetic rim lesions (PRLs) are a rare but highly prognostic lesion subtype in multiple sclerosis, visible only on susceptibility ($\chi$) contrasts. This work presents a generative framework to:

    Synthesize new rim lesion maps that address class imbalance in training data Enable a novel denoising… See the full description on the dataset page: https://huggingface.co/datasets/agr78/PRLx-GAN-synthetic-rim.

  19. h

    Synthetic Dataset of Hospital Admissions for an acute Stroke

    • healthdatagateway.org
    unknown
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic Dataset of Hospital Admissions for an acute Stroke [Dataset]. https://healthdatagateway.org/en/dataset/1003
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Dec 4, 2024
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Strokes can be ischaemic or haemorrhagic in nature, leading to debilitating symptoms which are dependent on the location of the stroke in the brain and the severity of the insult. Stroke care is centred around Hyper-acute Stroke Units (HASU), Acute Stroke and Brain Injury Units (ASU/ABIU) and specialist stroke services. Early presentation enables the use of more invasive treatments to clear blood clots, but commonly strokes present late, preventing their use.

    This synthetic dataset represents approximately 29,000 stroke patients. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.

    The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. Real. data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

    Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute stroke services & specialist care across four hospital sites.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  20. h

    Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme...

    • healthdatagateway.org
    unknown
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme resource [Dataset]. https://healthdatagateway.org/dataset/189
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    This highly granular synthetic dataset created as an asset for the HDR UK Medicines programme includes information on 680 cancer patients over a period of three years. Includes simulated patient-related data, such as demographics & co-morbidities extracted from ICD-10 and SNOMED-CT codes. Serial, structured data pertaining to acute care process (readmissions, survival), primary diagnosis, presenting complaint, physiology readings, blood results (infection, inflammatory markers) and acuity markers such as AVPU Scale, NEWS2 score, imaging reports, prescribed & administered treatments including fluids, blood products, procedures, information on outpatient admissions and survival outcomes following one-year post discharge.

    The data was generated using a generative adversarial network model (CTGAN). A flat real data table was created by consolidating essential information from various key relational tables (medications, demographics). A synthetic version of the flat table was generated using a customized script based on the SDV package (N. Patki, 2016), that replicated the real distribution and logic relationships.

    Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and provide the real-data via application.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie (2021). pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs [Dataset]. http://doi.org/10.5281/zenodo.5031881
Organization logo

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Explore at:
binAvailable download formats
Dataset updated
Jun 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Dataset specification:

  • MRI images of Vertebral Units labelled based on region
  • Dataset is comprised of 10000 pairs of images and labels
  • Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]
  • Images are 3D of size (9, 64, 64)
  • Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/

Abstract:

Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.

Search
Clear search
Close search
Google apps
Main menu