44 datasets found
  1. f

    Data_Sheet_1_Generative Adversarial Networks for Augmenting Training Data of...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piotr Baniukiewicz; E. Josiah Lutton; Sharon Collier; Till Bretschneider (2023). Data_Sheet_1_Generative Adversarial Networks for Augmenting Training Data of Microscopic Cell Images.pdf [Dataset]. http://doi.org/10.3389/fcomp.2019.00010.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Piotr Baniukiewicz; E. Josiah Lutton; Sharon Collier; Till Bretschneider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generative adversarial networks (GANs) have recently been successfully used to create realistic synthetic microscopy cell images in 2D and predict intermediate cell stages. In the current paper we highlight that GANs can not only be used for creating synthetic cell images optimized for different fluorescent molecular labels, but that by using GANs for augmentation of training data involving scaling or other transformations the inherent length scale of biological structures is retained. In addition, GANs make it possible to create synthetic cells with specific shape features, which can be used, for example, to validate different methods for feature extraction. Here, we apply GANs to create 2D distributions of fluorescent markers for F-actin in the cell cortex of Dictyostelium cells (ABD), a membrane receptor (cAR1), and a cortex-membrane linker protein (TalA). The recent more widespread use of 3D lightsheet microscopy, where obtaining sufficient training data is considerably more difficult than in 2D, creates significant demand for novel approaches to data augmentation. We show that it is possible to directly generate synthetic 3D cell images using GANs, but limitations are excessive training times, dependence on high-quality segmentations of 3D images, and that the number of z-slices cannot be freely adjusted without retraining the network. We demonstrate that in the case of molecular labels that are highly correlated with cell shape, like F-actin in our example, 2D GANs can be used efficiently to create pseudo-3D synthetic cell data from individually generated 2D slices. Because high quality segmented 2D cell data are more readily available, this is an attractive alternative to using less efficient 3D networks.

  2. i

    IIITDMJ_Maize

    • ieee-dataport.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poornima Thakur, IIITDMJ_Maize [Dataset]. https://ieee-dataport.org/documents/iiitdmjmaize
    Explore at:
    Authors
    Poornima Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    including both sunny and cloudy days.

  3. Synthetic Skin Disease Dataset/Real and Synthetic

    • kaggle.com
    zip
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DevDope (2024). Synthetic Skin Disease Dataset/Real and Synthetic [Dataset]. https://www.kaggle.com/datasets/devdope/synthetic-skin-disease-datasetreal-and-synthetic
    Explore at:
    zip(1259800730 bytes)Available download formats
    Dataset updated
    Aug 7, 2024
    Authors
    DevDope
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Title

    Skin Disease GAN-Generated and Original Images Lightweight Dataset

    General Description

    This dataset is a collection of skin disease images generated using a Generative Adversarial Network (GAN) approach. Specifically, a GAN was utilized with Stable Diffusion as the generator and a transformer-based discriminator to create realistic images of various skin diseases. The GAN approach enhances the accuracy and realism of the generated images, making this dataset a valuable resource for machine learning and computer vision applications in dermatology.

    Creation Process

    To create this dataset, a series of Low-Rank Adaptations (LoRAs) were generated for each disease category. These LoRAs were trained on the base dataset with 60 epochs and 30,000 steps using OneTrainer. Images were then generated for the following disease categories:

    • Herpes
    • Measles
    • Chickenpox
    • Monkeypox

    Due to the availability of ample public images, Melanoma was excluded from the generation process. The Fooocus API served as the generator within the GAN framework, creating images based on the LoRAs.

    To ensure quality and accuracy, a transformer-based discriminator was employed to verify the generated images, classifying them into the correct disease categories.

    Sources

    The original base dataset used to create this GAN-based dataset includes reputable sources such as:

    2019 HAM10000 Challenge - Kaggle - Google Images - Dermnet NZ - Bing Images - Yandex - Hellenic Atlas - Dermatological Atlas The LoRAs and their recommended weights for generating images are available for download on our CivitAi profile. You can refer to this profile for detailed instructions and access to the LoRAs used in this dataset.

    Dataset Contents

    Generated Images: High-quality images of skin diseases generated via GAN with Stable Diffusion, using transformer-based discrimination for accurate classification.

    Categories

    • Herpes
    • Measles
    • Chickenpox
    • Monkeypox Each image corresponds to one of these four categories, providing a reliable set of generated data for training and evaluation. Melanoma was excluded from generation due to the abundance of public data.

    Suggested Use Cases

    This dataset is suitable for:

    • Image Classification and Augmentation Tasks: Training and evaluating models in skin disease classification, with additional augmentation from generated images.
    • Research in Dermatology and GAN Techniques: Investigating the effectiveness of GANs for generating medical images, as well as exploring the use of transformer-based discrimination.
    • Educational Projects in AI and Medicine: Offering insights into image generation for diagnostic purposes, combining GANs and Stable Diffusion with transformers for medical datasets.

    Citation

    Garcia-Espinosa, E. ., Ruiz-Castilla, J. S., & Garcia-Lamont, F. (2025). Generative AI and Transformers in Advanced Skin Lesion Classification applied on a mobile device. International Journal of Combinatorial Optimization Problems and Informatics, 16(2), 158–175. https://doi.org/10.61467/2007.1558.2025.v16i2.1078

    ** **

    Espinosa, E.G., Castilla, J.S.R., Lamont, F.G. (2025). Skin Disease Pre-diagnosis with Novel Visual Transformers. In: Figueroa-García, J.C., Hernández, G., Suero Pérez, D.F., Gaona García, E.E. (eds) Applied Computer Sciences in Engineering. WEA 2024. Communications in Computer and Information Science, vol 2222. Springer, Cham. https://doi.org/10.1007/978-3-031-74595-9_10

  4. D

    Generative Adversarial Networks Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Generative Adversarial Networks Market Research Report 2033 [Dataset]. https://dataintelo.com/report/generative-adversarial-networks-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Generative Adversarial Networks (GANs) Market Outlook




    According to our latest research, the global Generative Adversarial Networks (GANs) market size in 2024 stands at USD 2.78 billion, with robust growth expected over the next decade. The market is projected to expand at a CAGR of 32.5% from 2025 to 2033, reaching an estimated value of USD 33.5 billion by the end of the forecast period. The remarkable growth trajectory can be attributed to the increasing adoption of advanced deep learning techniques, rising demand for synthetic data generation, and the proliferation of AI-driven solutions across diverse industries. As per our latest research, key growth drivers include technological advancements, expanding applications across verticals, and the accelerating pace of digital transformation globally.




    One of the primary growth factors fueling the Generative Adversarial Networks market is the surge in demand for high-quality synthetic data, which has become critical for training robust machine learning models. With the growing concerns around data privacy and the scarcity of labeled datasets, organizations are leveraging GANs to generate realistic synthetic datasets that preserve privacy while maintaining statistical validity. This trend is especially pronounced in sectors such as healthcare, where patient data sensitivity is paramount, and in the automotive industry, where synthetic data aids in developing autonomous vehicle algorithms. The ability of GANs to produce lifelike images, videos, and text data is revolutionizing data augmentation processes, reducing dependency on costly and time-consuming data collection efforts, and accelerating the pace of innovation.




    Another significant driver is the rapid evolution of GAN architectures and their expanding application scope. Recent advancements in GAN technology, including StyleGAN, CycleGAN, and BigGAN, have dramatically improved the fidelity and versatility of generated content. These innovations have unlocked new possibilities in fields such as image generation, video synthesis, drug discovery, and even text-to-image synthesis. Enterprises in media and entertainment are utilizing GANs to create photorealistic visual effects, while pharmaceutical companies are leveraging the technology for accelerated drug molecule design and discovery. The versatility of GANs, coupled with their ability to automate creative processes and generate novel content, is attracting substantial investments from both established players and startups, further propelling market growth.




    The increasing integration of GANs into cloud-based platforms and AI-as-a-Service offerings is also playing a crucial role in market expansion. Cloud deployment models enable organizations of all sizes to access powerful GAN capabilities without substantial upfront infrastructure investments. This democratization of access is particularly beneficial for small and medium enterprises (SMEs), allowing them to harness advanced generative AI for various applications, from marketing and e-commerce to cybersecurity and fraud detection. The scalability and flexibility offered by cloud-based GAN solutions are fostering widespread adoption, while ongoing advancements in hardware accelerators and optimized software frameworks are further lowering barriers to entry.




    From a regional perspective, North America currently dominates the GANs market, driven by the presence of leading technology companies, extensive research and development activities, and a mature ecosystem for AI adoption. However, Asia Pacific is emerging as a high-growth region, fueled by rapid digitalization, increasing investments in AI research, and the proliferation of innovative startups. Europe, with its strong emphasis on data privacy and regulatory compliance, is witnessing growing adoption of GANs for privacy-preserving data generation and synthetic data applications. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by government initiatives, expanding digital infrastructure, and rising awareness about the transformative potential of generative AI technologies.



    Component Analysis




    The Generative Adversarial Networks market by component is segmented into software, hardware, and services. The software segment currently holds the largest share, driven by the rapid evolution of GAN frameworks, toolkits, and libraries that facilitate the development, training, and deployment of generative models. Op

  5. BPCO dataset based GANs for IoMT

    • kaggle.com
    zip
    Updated May 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Network security group CNR-IEIIT (2021). BPCO dataset based GANs for IoMT [Dataset]. https://www.kaggle.com/cnrieiit/bpco-dataset-based-gans-for-iomt
    Explore at:
    zip(1440638 bytes)Available download formats
    Dataset updated
    May 27, 2021
    Authors
    Network security group CNR-IEIIT
    Description

    A Generative Adversarial Network (GAN) Technique for Internet of Medical Things Data

    The proposed work aims to create a dataset linked to the Internet of Medical Things (IoMT) context by exploiting an innovative approach to create synthetic dataset by using Generative Adversarial Networks (GANs). In particular, the synthetic dataset is created by using GANs network starting from data retrieved by the IoT sensors.

    Getting Started

    In order to user the dataset, simply download the repository and start to work with the xlsx files. More information are available at the following page: Vaccari, I.; Orani, V.; Paglialonga, A.; Cambiaso, E.; Mongelli, M. A Generative Adversarial Network (GAN) Technique for Internet of Medical Things Data. Sensors 2021, 21, 3726.

    Please if you use this dataset in a research work, please cite this article.

    IoMT devices adopted

    The devices adopted to retrieve patients data are: * An electrocardiogram (ECG) patch also providing day/night movement monitoring, * A pulse meter providing oximetry monitoring, * A weight scale, * A sphygmomanometer for blood pressure monitoring, * A spirometer for peak flow and FEV1 parameters.

    Data are collected every day for three consecutive months: oxygen, body temperature, heart rate, heart rate master, weight, Body Mass Index, FEV1, PEF, MAP, diastolic blood pressure, systolic blood pressure.

    Repository organization

    The repository is composed by 3 folder:

    • Data
      • Input
        • dataPneulytics.xlsx
      • Output
        • GenData{1-11}.xlsx
    • summary.xlsx

    In the Data folder, two folders are presented: in the Input folder, the data retrieved by the IoMT sensors are reported while in the Output folder, the synthetic dataset generated through the GANs is uploaded. In the summary.xslx, results about comparison between the synthetic and real dataset are reported in terms of Jensen–Shannon (JS) divergence, Fréchet Inception Distance (FID), accuracy and F1 score.

    Authors

    • Ivan Vaccari - Concept, implementation, elaboration, paper writer - Profile
    • Vanessa Orani - implementation, elaboration, - Profile
    • Alessia Paglialonga - Supervisor, paper reviewer - Profile
    • Enrico Cambiaso - Supervisor, elaboration, paper collaboration - Profile
    • Maurizio Mongelli - Machine learning support and contribution - Profile
  6. f

    Table_1_Deep learning strategies for addressing issues with small datasets...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Dec 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gnimpieba, Etienne; Do, Tuyen; Gautum, Rishav; Gadhamshetty, Venkataramana; Hasan, Mahmudul; Allen, Cody; Jasthi, Bharat K.; Aryal, Shiva (2022). Table_1_Deep learning strategies for addressing issues with small datasets in 2D materials research: Microbial Corrosion.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000274599
    Explore at:
    Dataset updated
    Dec 22, 2022
    Authors
    Gnimpieba, Etienne; Do, Tuyen; Gautum, Rishav; Gadhamshetty, Venkataramana; Hasan, Mahmudul; Allen, Cody; Jasthi, Bharat K.; Aryal, Shiva
    Description

    Protective coatings based on two dimensional materials such as graphene have gained traction for diverse applications. Their impermeability, inertness, excellent bonding with metals, and amenability to functionalization renders them as promising coatings for both abiotic and microbiologically influenced corrosion (MIC). Owing to the success of graphene coatings, the whole family of 2D materials, including hexagonal boron nitride and molybdenum disulphide are being screened to obtain other promising coatings. AI-based data-driven models can accelerate virtual screening of 2D coatings with desirable physical and chemical properties. However, lack of large experimental datasets renders training of classifiers difficult and often results in over-fitting. Generate large datasets for MIC resistance of 2D coatings is both complex and laborious. Deep learning data augmentation methods can alleviate this issue by generating synthetic electrochemical data that resembles the training data classes. Here, we investigated two different deep generative models, namely variation autoencoder (VAE) and generative adversarial network (GAN) for generating synthetic data for expanding small experimental datasets. Our model experimental system included few layered graphene over copper surfaces. The synthetic data generated using GAN displayed a greater neural network system performance (83-85% accuracy) than VAE generated synthetic data (78-80% accuracy). However, VAE data performed better (90% accuracy) than GAN data (84%-85% accuracy) when using XGBoost. Finally, we show that synthetic data based on VAE and GAN models can drive machine learning models for developing MIC resistant 2D coatings.

  7. R

    AI in Generative Adversarial Networks Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). AI in Generative Adversarial Networks Market Research Report 2033 [Dataset]. https://researchintelo.com/report/ai-in-generative-adversarial-networks-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    AI in Generative Adversarial Networks (GANs) Market Outlook



    According to our latest research, the global AI in Generative Adversarial Networks (GANs) market size reached USD 2.65 billion in 2024, reflecting robust growth driven by rapid advancements in deep learning and artificial intelligence. The market is expected to register a remarkable CAGR of 31.4% from 2025 to 2033, accelerating the adoption of GANs across diverse industries. By 2033, the market is forecasted to achieve a value of USD 32.78 billion, underscoring the transformative impact of GANs in areas such as image and video generation, data augmentation, and synthetic content creation. This trajectory is supported by the increasing demand for highly realistic synthetic data and the expansion of AI-driven applications across enterprise and consumer domains.



    A primary growth factor for the AI in Generative Adversarial Networks market is the exponential increase in the availability and complexity of data that organizations must process. GANs, with their unique adversarial training methodology, have proven exceptionally effective for generating realistic synthetic data, which is crucial for industries like healthcare, automotive, and finance where data privacy and scarcity are significant concerns. The ability of GANs to create high-fidelity images, videos, and even text has enabled organizations to enhance their AI models, improve data diversity, and reduce bias, thereby accelerating the adoption of AI-driven solutions. Furthermore, the integration of GANs with cloud-based platforms and the proliferation of open-source GAN frameworks have democratized access to this technology, enabling both large enterprises and SMEs to harness its potential for innovative applications.



    Another significant driver for the AI in Generative Adversarial Networks market is the surge in demand for advanced content creation tools in media, entertainment, and marketing. GANs have revolutionized the way digital content is produced by enabling hyper-realistic image and video synthesis, deepfake generation, and automated design. This has not only streamlined creative workflows but also opened new avenues for personalized content, virtual influencers, and immersive experiences in gaming and advertising. The rapid evolution of GAN architectures, such as StyleGAN and CycleGAN, has further enhanced the quality and scalability of generative models, making them indispensable for enterprises seeking to differentiate their digital offerings and engage customers more effectively in a highly competitive landscape.



    The ongoing advancements in hardware acceleration and AI infrastructure have also played a pivotal role in propelling the AI in Generative Adversarial Networks market forward. The availability of powerful GPUs, TPUs, and AI-specific chips has significantly reduced the training time and computational costs associated with GANs, making them more accessible for real-time and large-scale applications. Additionally, the growing ecosystem of AI services and consulting has enabled organizations to overcome technical barriers, optimize GAN deployments, and ensure compliance with evolving regulatory standards. As investment in AI research continues to surge, the GANs market is poised for sustained innovation and broader adoption across sectors such as healthcare diagnostics, autonomous vehicles, financial modeling, and beyond.



    From a regional perspective, North America continues to dominate the AI in Generative Adversarial Networks market, accounting for the largest share in 2024, driven by its robust R&D ecosystem, strong presence of leading technology companies, and early adoption of AI technologies. Europe follows closely, with significant investments in AI research and regulatory initiatives promoting ethical AI development. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digital transformation, expanding AI talent pool, and increasing government support for AI innovation. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as enterprises in these regions begin to explore the potential of GANs for industry-specific applications.



    Component Analysis



    The AI in Generative Adversarial Networks market is segmented by component into software, hardware, and services, each playing a vital role in the ecosystem’s development and adoption. Software solutions constitute the largest share of the market in 2024, reflecting the growing demand for ad

  8. G

    Generative Adversarial Networks Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Generative Adversarial Networks Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/generative-adversarial-networks-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Aug 22, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Generative Adversarial Networks Market Outlook



    According to our latest research, the global Generative Adversarial Networks (GANs) market size reached $2.19 billion in 2024, reflecting the rapid adoption of deep learning technologies across industries. The market is registering a robust CAGR of 31.5% and is forecasted to achieve a value of $21.47 billion by 2033. This exponential growth is attributed to the increasing demand for advanced AI-driven content creation, synthetic data generation, and the transformative impact of GANs in sectors such as healthcare, media, and cybersecurity. The expanding ecosystem of AI research and the proliferation of high-performance computing infrastructure are further accelerating the adoption of GANs worldwide.




    A primary growth factor for the Generative Adversarial Networks market is the surging need for synthetic data generation. As organizations increasingly rely on data-intensive machine learning models, GANs have emerged as a pivotal technology to generate realistic, high-quality synthetic datasets that overcome data privacy and scarcity challenges. This is particularly crucial in sectors such as healthcare and finance, where access to diverse, high-fidelity data is often restricted by regulatory requirements. GANs enable the creation of anonymized yet statistically accurate datasets, facilitating model training without compromising sensitive information. Additionally, the growing sophistication of GAN architectures has led to improved output quality, making them indispensable for simulations, product development, and algorithm validation.




    Another significant driver is the integration of GANs into creative and media industries for content generation. GANs are revolutionizing image and video production by automating the creation of hyper-realistic visuals, deepfakes, and special effects, reducing both time and cost. Companies in advertising, gaming, and entertainment leverage GANs to generate novel content, restore old media, and personalize user experiences at scale. With the rise of virtual influencers, digital avatars, and immersive experiences in the metaverse, GANs are becoming foundational tools for brands seeking innovative ways to engage audiences. The continuous advancements in neural network architectures and training algorithms are further enhancing the capabilities of GANs in these creative domains.




    The increasing application of GANs in scientific research and drug discovery is also fueling market expansion. In the pharmaceutical industry, GANs are utilized to design new molecular structures, predict drug efficacy, and optimize clinical trials by generating synthetic patient data. This accelerates the drug development pipeline and reduces R&D costs. Similarly, in cybersecurity, GANs are deployed to simulate cyberattacks and generate adversarial examples, helping organizations bolster their defense mechanisms. The versatility of GANs in addressing complex, real-world problems across diverse sectors underscores their growing importance and widespread adoption in the coming years.




    Regionally, North America continues to dominate the Generative Adversarial Networks market, driven by a robust AI research ecosystem, significant investments from tech giants, and the early adoption of advanced machine learning solutions. However, Asia Pacific is witnessing the fastest growth, propelled by increasing government initiatives, a burgeoning startup landscape, and rapid digital transformation in countries like China, Japan, and South Korea. Europe is also making significant strides, particularly in regulated industries such as healthcare and automotive, where GANs are being leveraged for innovation and compliance. The global expansion of cloud infrastructure and cross-border collaborations in AI research are further contributing to the widespread adoption and growth of the GANs market across all regions.





    Component Analysis



    The Generative Adversarial Networks market is segmented by component into Software, Hardware, and Services</b&g

  9. h

    Synthetic dataset of hospitalised patients with an acute exacerbation of...

    • healthdatagateway.org
    unknown
    Updated Jan 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic dataset of hospitalised patients with an acute exacerbation of asthma [Dataset]. https://healthdatagateway.org/dataset/1015
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jan 5, 2024
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    To support respiratory research, a synthetic asthma dataset was generated based on a real-world data, originally documenting 381 patients with physician-confirmed asthma who were admitted to secondary care at a single centre in 2019. The dataset is highly detailed, covering demographics, structured physiological data, medication records, and clinical outcomes. The synthetic version extends to 561 patients admitted over a year, offering insights into patient patterns, risk factors, and treatment strategies.

    The dataset was created using the Synthetic Data Vault package, specifically employing the GAN synthesizer. Real data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

    Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Real world data. Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real-world data upon request.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  10. h

    Synthetic Data: Acute Atrial Fibrillation Patient Profiles, Clinical...

    • healthdatagateway.org
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158), Synthetic Data: Acute Atrial Fibrillation Patient Profiles, Clinical Insights [Dataset]. https://healthdatagateway.org/en/dataset/1002
    Explore at:
    unknownAvailable download formats
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Atrial fibrillation (AF) is a common abnormal heart rhythm that causes the heart to beat irregularly and often too fast. AF increases the risk of stroke and heart failure. AF primarily affects older adults and individuals with chronic conditions such as heart disease, high blood pressure, or obesity. Additional factors include congenital heart disease, and cardiomyopathy. AF can be treated by ablation or controlled using medication. The risk of stroke can be reduced using anti-coagulants.

    This synthetic AF dataset comprises of 24.8k “patients” including demographics, co-morbidities, presenting symptoms and medical events during hospital stays, coded with ICD-10 and SNOMED-CT.

    Using the Synthetic Data Vault package with a GAN synthesizer, a synthetic dataset was generated from real clinical data. The dataset includes demographic information and hospital admission details. The real data was pre-processed for correct datetime parsing and metadata was defined to capture schema structure, guiding the synthesizer in learning data distributions and relationships. The resulting synthetic dataset closely mirrors the statistical properties of the original, supporting privacy-preserving analysis and model training.

    Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  11. GAN Generated Images for Facial Expression Recognition systems

    • zenodo.org
    bin
    Updated Jul 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIMONE PORCU; SIMONE PORCU; Alessandro Floris; Alessandro Floris; Luigi Atzori; Luigi Atzori (2025). GAN Generated Images for Facial Expression Recognition systems [Dataset]. http://doi.org/10.21227/b7m1-rz14
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    SIMONE PORCU; SIMONE PORCU; Alessandro Floris; Alessandro Floris; Luigi Atzori; Luigi Atzori
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2020
    Description

    Most facial expression recognition (FER) systems rely on machine learning approaches that require large databases (DBs) for effective training. As these are not easily available, a good solution is to augment the DBs with appropriate techniques, which are typically based on either geometric transformation or deep learning based technologies (e.g., Generative Adversarial Networks (GANs)). Whereas the first category of techniques has been fairly adopted in the past, studies that use GAN-based techniques are limited for FER systems. To advance in this respect, we evaluate the impact of the GAN techniques by creating a new DB containing the generated synthetic images.

    The face images contained in the KDEF DB serve as the basis for creating novel synthetic images by combining the facial features of two images (i.e., Candie Kung and Cristina Saralegui) selected from the YouTube-Faces DB. The novel images differ from each other, in particular concerning the eyes, the nose, and the mouth, whose characteristics are taken from the Candie and Cristina images.

    The total number of novel synthetic images generated with the GAN is 980 (70 individuals from KDEF DB x 7 emotions x 2 subjects from YouTube-Faces DB).

    The zip file "GAN_KDEF_Candie" contains the 490 images generated by combining the KDEF images with the Candie Kung image. The zip file "GAN_KDEF_Cristina" contains the 490 images generated by combining the KDEF images with the Cristina Saralegui image. The used image IDs are the same used for the KDEF DB. The synthetic generated images have a resolution of 562x762 pixels.

    If you make use of this dataset, please consider citing the following publication:

    Porcu, S., Floris, A., & Atzori, L. (2020). Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems. Electronics, 9, 1892, doi: 10.3390/electronics9111892, url: https://www.mdpi.com/2079-9292/9/11/1892.

    BibTex format:

    @article{porcu2020evaluation, title={Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems}, author={Porcu, Simone and Floris, Alessandro and Atzori, Luigi}, journal={Electronics}, volume={9}, pages={108781}, year={2020}, number = {11}, article-number = {1892}, publisher={MDPI}, doi={10.3390/electronics9111892} }

  12. DeepFake ECG

    • kaggle.com
    zip
    Updated May 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VT (2021). DeepFake ECG [Dataset]. https://www.kaggle.com/datasets/vlbthambawita/deepfake-ecg/data
    Explore at:
    zip(16548382130 bytes)Available download formats
    Dataset updated
    May 27, 2021
    Authors
    VT
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DeepFake electrocardiograms: the beginning of the end for privacy issues in medicine

    Paper | GitHub | Original-data-source | PyPI

    Content

    Big data is needed to implement personalized medicine, but privacy issues are a prevalent problem for collecting data and sharing them between researchers. A solution is synthetic data generated to represent real dataset carrying similar information. Here, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 12-lead 10-sec electrocardiograms (ECGs). We have developed and compare two methods, namely WaveGAN* and Pulse2Pulse GAN. We trained the GANs with 7,233 real normal ECG to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. These synthetic ECGs are fully anonymous and cannot be referred to any individual, hence they may be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using adversarial neural networks on normal ECGs from two population studies, i.e., there by solving the relevant privacy issues in medical datasets.

    Citation (cite this paper to use this dataset in your research)

    @article{thambawita2021deepfake,
     title={DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine},
     author={Thambawita, Vajira and Isaksen, Jonas L and Hicks, Steven A and Ghouse, Jonas and Ahlberg, Gustav and Linneberg, Allan and Grarup, Niels and Ellervik, Christina and Olesen, Morten Salling and Hansen, Torben and others},
     journal={Scientific reports},
     volume={11},
     number={1},
     pages={1--8},
     year={2021},
     publisher={Nature Publishing Group}
    }
    
  13. h

    Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme...

    • web.prod.hdruk.cloud
    unknown
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Immune Checkpoint Inhibitors synthetic data: HDR UK Medicines Programme resource [Dataset]. https://web.prod.hdruk.cloud/dataset/189
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    This highly granular synthetic dataset created as an asset for the HDR UK Medicines programme includes information on 680 cancer patients over a period of three years. Includes simulated patient-related data, such as demographics & co-morbidities extracted from ICD-10 and SNOMED-CT codes. Serial, structured data pertaining to acute care process (readmissions, survival), primary diagnosis, presenting complaint, physiology readings, blood results (infection, inflammatory markers) and acuity markers such as AVPU Scale, NEWS2 score, imaging reports, prescribed & administered treatments including fluids, blood products, procedures, information on outpatient admissions and survival outcomes following one-year post discharge.

    The data was generated using a generative adversarial network model (CTGAN). A flat real data table was created by consolidating essential information from various key relational tables (medications, demographics). A synthetic version of the flat table was generated using a customized script based on the SDV package (N. Patki, 2016), that replicated the real distribution and logic relationships.

    Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and provide the real-data via application.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  14. Intrusion_detection_dataset

    • kaggle.com
    zip
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamza Farooq (2023). Intrusion_detection_dataset [Dataset]. https://www.kaggle.com/ameerhamza123/intrusion-detection-dataset
    Explore at:
    zip(371495 bytes)Available download formats
    Dataset updated
    Jun 23, 2023
    Authors
    Hamza Farooq
    Description

    This dataset contains network traffic data collected from a computer network. The network consists of various devices, such as computers, servers, and routers, interconnected to facilitate communication and data exchange. The dataset captures different types of network activities, including normal network traffic as well as various network anomalies and attacks. It provides a comprehensive view of the network behavior and can be used for studying network security, intrusion detection, and anomaly detection algorithms. The dataset includes features such as source and destination IP addresses, port numbers, protocol types, packet sizes, and timestamps, enabling detailed analysis of network traffic patterns and characteristics and so on... The second file in this dataset contains synthetic data that has been generated using a Generative Adversarial Network (GAN). GANs are a type of deep learning model that can learn the underlying patterns and distributions of a given dataset and generate new synthetic samples that resemble the original data. In this case, the GAN has been trained on the network traffic data from the first file to learn the characteristics and structure of the network traffic. The generated synthetic data in the second file aims to mimic the patterns and behavior observed in real network traffic. This synthetic data can be used for various purposes, such as augmenting the original dataset, testing the robustness of machine learning models, or exploring different scenarios in network analysis.

  15. f

    DataSheet_1_Convolutional Neural Net-Based Cassava Storage Root Counting...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Atanbori; Maria Elker Montoya-P; Michael Gomez Selvaraj; Andrew P. French; Tony P. Pridmore (2023). DataSheet_1_Convolutional Neural Net-Based Cassava Storage Root Counting Using Real and Synthetic Images.pdf [Dataset]. http://doi.org/10.3389/fpls.2019.01516.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    John Atanbori; Maria Elker Montoya-P; Michael Gomez Selvaraj; Andrew P. French; Tony P. Pridmore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cassava roots are complex structures comprising several distinct types of root. The number and size of the storage roots are two potential phenotypic traits reflecting crop yield and quality. Counting and measuring the size of cassava storage roots are usually done manually, or semi-automatically by first segmenting cassava root images. However, occlusion of both storage and fibrous roots makes the process both time-consuming and error-prone. While Convolutional Neural Nets have shown performance above the state-of-the-art in many image processing and analysis tasks, there are currently a limited number of Convolutional Neural Net-based methods for counting plant features. This is due to the limited availability of data, annotated by expert plant biologists, which represents all possible measurement outcomes. Existing works in this area either learn a direct image-to-count regressor model by regressing to a count value, or perform a count after segmenting the image. We, however, address the problem using a direct image-to-count prediction model. This is made possible by generating synthetic images, using a conditional Generative Adversarial Network (GAN), to provide training data for missing classes. We automatically form cassava storage root masks for any missing classes using existing ground-truth masks, and input them as a condition to our GAN model to generate synthetic root images. We combine the resulting synthetic images with real images to learn a direct image-to-count prediction model capable of counting the number of storage roots in real cassava images taken from a low cost aeroponic growth system. These models are used to develop a system that counts cassava storage roots in real images. Our system first predicts age group ('young' and 'old' roots; pertinent to our image capture regime) in a given image, and then, based on this prediction, selects an appropriate model to predict the number of storage roots. We achieve 91% accuracy on predicting ages of storage roots, and 86% and 71% overall percentage agreement on counting 'old' and 'young' storage roots respectively. Thus we are able to demonstrate that synthetically generated cassava root images can be used to supplement missing root classes, turning the counting problem into a direct image-to-count prediction task.

  16. h

    Synthetic Dataset of Hospital Admissions for an acute Stroke

    • healthdatagateway.org
    unknown
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2024). Synthetic Dataset of Hospital Admissions for an acute Stroke [Dataset]. https://healthdatagateway.org/en/dataset/1003
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Dec 4, 2024
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Strokes can be ischaemic or haemorrhagic in nature, leading to debilitating symptoms which are dependent on the location of the stroke in the brain and the severity of the insult. Stroke care is centred around Hyper-acute Stroke Units (HASU), Acute Stroke and Brain Injury Units (ASU/ABIU) and specialist stroke services. Early presentation enables the use of more invasive treatments to clear blood clots, but commonly strokes present late, preventing their use.

    This synthetic dataset represents approximately 29,000 stroke patients. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.

    The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. Real. data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

    Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute stroke services & specialist care across four hospital sites.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  17. Data_Sheet_1_Synthetic artificial intelligence using generative adversarial...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhaoran Wang; Gilbert Lim; Wei Yan Ng; Tien-En Tan; Jane Lim; Sing Hui Lim; Valencia Foo; Joshua Lim; Laura Gutierrez Sinisterra; Feihui Zheng; Nan Liu; Gavin Siew Wei Tan; Ching-Yu Cheng; Gemmy Chui Ming Cheung; Tien Yin Wong; Daniel Shu Wei Ting (2023). Data_Sheet_1_Synthetic artificial intelligence using generative adversarial network for retinal imaging in detection of age-related macular degeneration.docx [Dataset]. http://doi.org/10.3389/fmed.2023.1184892.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Zhaoran Wang; Gilbert Lim; Wei Yan Ng; Tien-En Tan; Jane Lim; Sing Hui Lim; Valencia Foo; Joshua Lim; Laura Gutierrez Sinisterra; Feihui Zheng; Nan Liu; Gavin Siew Wei Tan; Ching-Yu Cheng; Gemmy Chui Ming Cheung; Tien Yin Wong; Daniel Shu Wei Ting
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionAge-related macular degeneration (AMD) is one of the leading causes of vision impairment globally and early detection is crucial to prevent vision loss. However, the screening of AMD is resource dependent and demands experienced healthcare providers. Recently, deep learning (DL) systems have shown the potential for effective detection of various eye diseases from retinal fundus images, but the development of such robust systems requires a large amount of datasets, which could be limited by prevalence of the disease and privacy of patient. As in the case of AMD, the advanced phenotype is often scarce for conducting DL analysis, which may be tackled via generating synthetic images using Generative Adversarial Networks (GANs). This study aims to develop GAN-synthesized fundus photos with AMD lesions, and to assess the realness of these images with an objective scale.MethodsTo build our GAN models, a total of 125,012 fundus photos were used from a real-world non-AMD phenotypical dataset. StyleGAN2 and human-in-the-loop (HITL) method were then applied to synthesize fundus images with AMD features. To objectively assess the quality of the synthesized images, we proposed a novel realness scale based on the frequency of the broken vessels observed in the fundus photos. Four residents conducted two rounds of gradings on 300 images to distinguish real from synthetic images, based on their subjective impression and the objective scale respectively.Results and discussionThe introduction of HITL training increased the percentage of synthetic images with AMD lesions, despite the limited number of AMD images in the initial training dataset. Qualitatively, the synthesized images have been proven to be robust in that our residents had limited ability to distinguish real from synthetic ones, as evidenced by an overall accuracy of 0.66 (95% CI: 0.61–0.66) and Cohen’s kappa of 0.320. For the non-referable AMD classes (no or early AMD), the accuracy was only 0.51. With the objective scale, the overall accuracy improved to 0.72. In conclusion, GAN models built with HITL training are capable of producing realistic-looking fundus images that could fool human experts, while our objective realness scale based on broken vessels can help identifying the synthetic fundus photos.

  18. h

    Synthetic Dataset of Hospital Admissions for Patients with Type 1 and 2...

    • healthdatagateway.org
    unknown
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158), Synthetic Dataset of Hospital Admissions for Patients with Type 1 and 2 Diabetes [Dataset]. https://healthdatagateway.org/en/dataset/1004
    Explore at:
    unknownAvailable download formats
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Type 1 Diabetes is an autoimmune disease impacting on insulin production. Type 2 Diabetes is caused by insulin resistance. Both are chronic conditions associated with serious complications such as heart disease, kidney failure, vision loss, and neuropathy. In the UK, 10% of the NHS budget is spent on managing diabetes. The demand for care is rising, with an increasing number of acute hospital admissions.

    This highly granular synthetic dataset represents approximately 159,800 diabetes patients acutely admitted between 2004 and 2022. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.

    The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. The real data was read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.

    Geography: This synthetic dataset is based on patient data from the West Midlands. The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity.

    Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.

    Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build different synthetic data to meet bespoke requirements.

    Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.

  19. dataset for fraud detection

    • kaggle.com
    zip
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket P (2023). dataset for fraud detection [Dataset]. https://www.kaggle.com/datasets/saket03p/ethereum-fraud-detection-aggregated-dataset/discussion
    Explore at:
    zip(1209235 bytes)Available download formats
    Dataset updated
    May 18, 2023
    Authors
    Saket P
    Description

    Context

    The dataset proposed is an aggregation over the already exisiting dataset for ethereum transactions fraud.

    Shape of the Dataset

    • Number of Attributes: 19 (excluding the Index column)
    • Number of Rows: 19682

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11540148%2Fd64d10dbd0ac39b7cdd3a3986c8be823%2FFLAG%20Pie%20Chart%20on%20Aggregation.png?generation=1687540862214771&alt=media" alt="Overall FLAG Distribution">

    Overall Distribution of the FLAG Column

    View Notebook

    The creation of the dataset was a preliminary task in our Minor Project where our main aim was to use a Cost Sensitive based approach to assign weights to the 2 classes of transactions inorder to prioritize the reduce of the cases of missing fraudulent transactions occurring within the ecosystem, from a larger set of transactions
    Cost Sensitive Learning Approach to Ethereum Transactions Fraud Detection using Machine Learning

    Content

    • Index: the index number of a row.
    • FLAG: the address of the ethereum account.
    • Avg min between sent tnx: Average time between sent transactions for account in minutes.
    • Avg min between receive tnx: Average time between received transactions for account in minutes.
    • Time Diff between first and last (Mins): Time difference between the first and last transaction.
    • Sent tnx: Total number of sent normal transactions.
    • Received Tnx: Total number of received normal transactions.
    • Number of Created Contracts: Total Number of created contract transactions.
    • max value received: Maximum value in Ether ever received.
    • avg val received: Average value in Ether ever received.
    • avg val sent: Average value of Ether ever sent.
    • total Ether sent: Total Ether sent for account address.
    • total ether balance: Total Ether Balance following enacted transactions.
    • ERC20 total Ether received: Total ERC20 token received transactions in Ether.
    • ERC20 total ether sent: Total ERC20token sent transactions in Ether.
    • ERC20 total Ether sent contract: Total ERC20 token transfer to other contracts in Ether.
    • ERC20 uniq sent addr: Number of ERC20 token transactions sent to Unique account addresses.
    • ERC20 uniq rec token name: Number of Unique ERC20 tokens received.
    • ERC20 most sent token type: Most sent token for account via ERC20 transaction.
    • ERC20_most_rec_token_type: Most received token for account via ERC20 transactions.

    Issues with existing dataset:

    • In the sectors of finance or health care, there is always a scarcity of data due to privacy concerns of their users. And so the size of the datasets being used for classification purposes in such domains also reduces. This lead us to come up with creating an enlarged dataset for the same purpose.
    • Such a small dataset causes our classification models to overfit on the training set and do not capture the necessary root causes for finding frauds in the ecosystem.
      • Additionally, the outliers and incomplete samples also cause the dataset to diminish on data pre-processing steps to an already existing small dataset.

    Methodology

    We've considered generating new synthetic data samples from our already exisitng dataset by employing the CTGANSynthsizer Model which is very suitable for generating samples statistically representative of the original tabular dataset; thereby doubling the datset at an overall level.

    Tasks Accomplished

    Successfully created the aggregated dataset by doubling the original dataset with synthetic data samples, which have 85.63% similarity score with the existing dataset.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11540148%2F4b4ff58f17f2283604bf5e85ab80ebe3%2FLoss%20Plot.png?generation=1687539313296286&alt=media" alt="CTGAN Loss Function for the dataset"> CTGAN Loss Function for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11540148%2F8c8ecbd8312914c7facc641b8cad406e%2FColumns%20Similarity.png?generation=1687539791786124&alt=media" alt="Columns Similarity between Synthetic & Original data"> Columns Similarity between Synthetic & Original data

  20. Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    xz
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
    Explore at:
    xzAvailable download formats
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

    Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Piotr Baniukiewicz; E. Josiah Lutton; Sharon Collier; Till Bretschneider (2023). Data_Sheet_1_Generative Adversarial Networks for Augmenting Training Data of Microscopic Cell Images.pdf [Dataset]. http://doi.org/10.3389/fcomp.2019.00010.s001

Data_Sheet_1_Generative Adversarial Networks for Augmenting Training Data of Microscopic Cell Images.pdf

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Piotr Baniukiewicz; E. Josiah Lutton; Sharon Collier; Till Bretschneider
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Generative adversarial networks (GANs) have recently been successfully used to create realistic synthetic microscopy cell images in 2D and predict intermediate cell stages. In the current paper we highlight that GANs can not only be used for creating synthetic cell images optimized for different fluorescent molecular labels, but that by using GANs for augmentation of training data involving scaling or other transformations the inherent length scale of biological structures is retained. In addition, GANs make it possible to create synthetic cells with specific shape features, which can be used, for example, to validate different methods for feature extraction. Here, we apply GANs to create 2D distributions of fluorescent markers for F-actin in the cell cortex of Dictyostelium cells (ABD), a membrane receptor (cAR1), and a cortex-membrane linker protein (TalA). The recent more widespread use of 3D lightsheet microscopy, where obtaining sufficient training data is considerably more difficult than in 2D, creates significant demand for novel approaches to data augmentation. We show that it is possible to directly generate synthetic 3D cell images using GANs, but limitations are excessive training times, dependence on high-quality segmentations of 3D images, and that the number of z-slices cannot be freely adjusted without retraining the network. We demonstrate that in the case of molecular labels that are highly correlated with cell shape, like F-actin in our example, 2D GANs can be used efficiently to create pseudo-3D synthetic cell data from individually generated 2D slices. Because high quality segmented 2D cell data are more readily available, this is an attractive alternative to using less efficient 3D networks.

Search
Clear search
Close search
Google apps
Main menu