Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generative adversarial networks (GANs) have recently been successfully used to create realistic synthetic microscopy cell images in 2D and predict intermediate cell stages. In the current paper we highlight that GANs can not only be used for creating synthetic cell images optimized for different fluorescent molecular labels, but that by using GANs for augmentation of training data involving scaling or other transformations the inherent length scale of biological structures is retained. In addition, GANs make it possible to create synthetic cells with specific shape features, which can be used, for example, to validate different methods for feature extraction. Here, we apply GANs to create 2D distributions of fluorescent markers for F-actin in the cell cortex of Dictyostelium cells (ABD), a membrane receptor (cAR1), and a cortex-membrane linker protein (TalA). The recent more widespread use of 3D lightsheet microscopy, where obtaining sufficient training data is considerably more difficult than in 2D, creates significant demand for novel approaches to data augmentation. We show that it is possible to directly generate synthetic 3D cell images using GANs, but limitations are excessive training times, dependence on high-quality segmentations of 3D images, and that the number of z-slices cannot be freely adjusted without retraining the network. We demonstrate that in the case of molecular labels that are highly correlated with cell shape, like F-actin in our example, 2D GANs can be used efficiently to create pseudo-3D synthetic cell data from individually generated 2D slices. Because high quality segmented 2D cell data are more readily available, this is an attractive alternative to using less efficient 3D networks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
including both sunny and cloudy days.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Skin Disease GAN-Generated and Original Images Lightweight Dataset
This dataset is a collection of skin disease images generated using a Generative Adversarial Network (GAN) approach. Specifically, a GAN was utilized with Stable Diffusion as the generator and a transformer-based discriminator to create realistic images of various skin diseases. The GAN approach enhances the accuracy and realism of the generated images, making this dataset a valuable resource for machine learning and computer vision applications in dermatology.
To create this dataset, a series of Low-Rank Adaptations (LoRAs) were generated for each disease category. These LoRAs were trained on the base dataset with 60 epochs and 30,000 steps using OneTrainer. Images were then generated for the following disease categories:
Due to the availability of ample public images, Melanoma was excluded from the generation process. The Fooocus API served as the generator within the GAN framework, creating images based on the LoRAs.
To ensure quality and accuracy, a transformer-based discriminator was employed to verify the generated images, classifying them into the correct disease categories.
The original base dataset used to create this GAN-based dataset includes reputable sources such as:
2019 HAM10000 Challenge - Kaggle - Google Images - Dermnet NZ - Bing Images - Yandex - Hellenic Atlas - Dermatological Atlas The LoRAs and their recommended weights for generating images are available for download on our CivitAi profile. You can refer to this profile for detailed instructions and access to the LoRAs used in this dataset.
Generated Images: High-quality images of skin diseases generated via GAN with Stable Diffusion, using transformer-based discrimination for accurate classification.
This dataset is suitable for:
Garcia-Espinosa, E. ., Ruiz-Castilla, J. S., & Garcia-Lamont, F. (2025). Generative AI and Transformers in Advanced Skin Lesion Classification applied on a mobile device. International Journal of Combinatorial Optimization Problems and Informatics, 16(2), 158–175. https://doi.org/10.61467/2007.1558.2025.v16i2.1078
Espinosa, E.G., Castilla, J.S.R., Lamont, F.G. (2025). Skin Disease Pre-diagnosis with Novel Visual Transformers. In: Figueroa-García, J.C., Hernández, G., Suero Pérez, D.F., Gaona García, E.E. (eds) Applied Computer Sciences in Engineering. WEA 2024. Communications in Computer and Information Science, vol 2222. Springer, Cham. https://doi.org/10.1007/978-3-031-74595-9_10
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Generative Adversarial Networks (GANs) market size in 2024 stands at USD 2.78 billion, with robust growth expected over the next decade. The market is projected to expand at a CAGR of 32.5% from 2025 to 2033, reaching an estimated value of USD 33.5 billion by the end of the forecast period. The remarkable growth trajectory can be attributed to the increasing adoption of advanced deep learning techniques, rising demand for synthetic data generation, and the proliferation of AI-driven solutions across diverse industries. As per our latest research, key growth drivers include technological advancements, expanding applications across verticals, and the accelerating pace of digital transformation globally.
One of the primary growth factors fueling the Generative Adversarial Networks market is the surge in demand for high-quality synthetic data, which has become critical for training robust machine learning models. With the growing concerns around data privacy and the scarcity of labeled datasets, organizations are leveraging GANs to generate realistic synthetic datasets that preserve privacy while maintaining statistical validity. This trend is especially pronounced in sectors such as healthcare, where patient data sensitivity is paramount, and in the automotive industry, where synthetic data aids in developing autonomous vehicle algorithms. The ability of GANs to produce lifelike images, videos, and text data is revolutionizing data augmentation processes, reducing dependency on costly and time-consuming data collection efforts, and accelerating the pace of innovation.
Another significant driver is the rapid evolution of GAN architectures and their expanding application scope. Recent advancements in GAN technology, including StyleGAN, CycleGAN, and BigGAN, have dramatically improved the fidelity and versatility of generated content. These innovations have unlocked new possibilities in fields such as image generation, video synthesis, drug discovery, and even text-to-image synthesis. Enterprises in media and entertainment are utilizing GANs to create photorealistic visual effects, while pharmaceutical companies are leveraging the technology for accelerated drug molecule design and discovery. The versatility of GANs, coupled with their ability to automate creative processes and generate novel content, is attracting substantial investments from both established players and startups, further propelling market growth.
The increasing integration of GANs into cloud-based platforms and AI-as-a-Service offerings is also playing a crucial role in market expansion. Cloud deployment models enable organizations of all sizes to access powerful GAN capabilities without substantial upfront infrastructure investments. This democratization of access is particularly beneficial for small and medium enterprises (SMEs), allowing them to harness advanced generative AI for various applications, from marketing and e-commerce to cybersecurity and fraud detection. The scalability and flexibility offered by cloud-based GAN solutions are fostering widespread adoption, while ongoing advancements in hardware accelerators and optimized software frameworks are further lowering barriers to entry.
From a regional perspective, North America currently dominates the GANs market, driven by the presence of leading technology companies, extensive research and development activities, and a mature ecosystem for AI adoption. However, Asia Pacific is emerging as a high-growth region, fueled by rapid digitalization, increasing investments in AI research, and the proliferation of innovative startups. Europe, with its strong emphasis on data privacy and regulatory compliance, is witnessing growing adoption of GANs for privacy-preserving data generation and synthetic data applications. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by government initiatives, expanding digital infrastructure, and rising awareness about the transformative potential of generative AI technologies.
The Generative Adversarial Networks market by component is segmented into software, hardware, and services. The software segment currently holds the largest share, driven by the rapid evolution of GAN frameworks, toolkits, and libraries that facilitate the development, training, and deployment of generative models. Op
Facebook
TwitterThe proposed work aims to create a dataset linked to the Internet of Medical Things (IoMT) context by exploiting an innovative approach to create synthetic dataset by using Generative Adversarial Networks (GANs). In particular, the synthetic dataset is created by using GANs network starting from data retrieved by the IoT sensors.
In order to user the dataset, simply download the repository and start to work with the xlsx files. More information are available at the following page: Vaccari, I.; Orani, V.; Paglialonga, A.; Cambiaso, E.; Mongelli, M. A Generative Adversarial Network (GAN) Technique for Internet of Medical Things Data. Sensors 2021, 21, 3726.
Please if you use this dataset in a research work, please cite this article.
The devices adopted to retrieve patients data are: * An electrocardiogram (ECG) patch also providing day/night movement monitoring, * A pulse meter providing oximetry monitoring, * A weight scale, * A sphygmomanometer for blood pressure monitoring, * A spirometer for peak flow and FEV1 parameters.
Data are collected every day for three consecutive months: oxygen, body temperature, heart rate, heart rate master, weight, Body Mass Index, FEV1, PEF, MAP, diastolic blood pressure, systolic blood pressure.
The repository is composed by 3 folder:
In the Data folder, two folders are presented: in the Input folder, the data retrieved by the IoMT sensors are reported while in the Output folder, the synthetic dataset generated through the GANs is uploaded. In the summary.xslx, results about comparison between the synthetic and real dataset are reported in terms of Jensen–Shannon (JS) divergence, Fréchet Inception Distance (FID), accuracy and F1 score.
Facebook
TwitterProtective coatings based on two dimensional materials such as graphene have gained traction for diverse applications. Their impermeability, inertness, excellent bonding with metals, and amenability to functionalization renders them as promising coatings for both abiotic and microbiologically influenced corrosion (MIC). Owing to the success of graphene coatings, the whole family of 2D materials, including hexagonal boron nitride and molybdenum disulphide are being screened to obtain other promising coatings. AI-based data-driven models can accelerate virtual screening of 2D coatings with desirable physical and chemical properties. However, lack of large experimental datasets renders training of classifiers difficult and often results in over-fitting. Generate large datasets for MIC resistance of 2D coatings is both complex and laborious. Deep learning data augmentation methods can alleviate this issue by generating synthetic electrochemical data that resembles the training data classes. Here, we investigated two different deep generative models, namely variation autoencoder (VAE) and generative adversarial network (GAN) for generating synthetic data for expanding small experimental datasets. Our model experimental system included few layered graphene over copper surfaces. The synthetic data generated using GAN displayed a greater neural network system performance (83-85% accuracy) than VAE generated synthetic data (78-80% accuracy). However, VAE data performed better (90% accuracy) than GAN data (84%-85% accuracy) when using XGBoost. Finally, we show that synthetic data based on VAE and GAN models can drive machine learning models for developing MIC resistant 2D coatings.
Facebook
Twitterhttps://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
According to our latest research, the global AI in Generative Adversarial Networks (GANs) market size reached USD 2.65 billion in 2024, reflecting robust growth driven by rapid advancements in deep learning and artificial intelligence. The market is expected to register a remarkable CAGR of 31.4% from 2025 to 2033, accelerating the adoption of GANs across diverse industries. By 2033, the market is forecasted to achieve a value of USD 32.78 billion, underscoring the transformative impact of GANs in areas such as image and video generation, data augmentation, and synthetic content creation. This trajectory is supported by the increasing demand for highly realistic synthetic data and the expansion of AI-driven applications across enterprise and consumer domains.
A primary growth factor for the AI in Generative Adversarial Networks market is the exponential increase in the availability and complexity of data that organizations must process. GANs, with their unique adversarial training methodology, have proven exceptionally effective for generating realistic synthetic data, which is crucial for industries like healthcare, automotive, and finance where data privacy and scarcity are significant concerns. The ability of GANs to create high-fidelity images, videos, and even text has enabled organizations to enhance their AI models, improve data diversity, and reduce bias, thereby accelerating the adoption of AI-driven solutions. Furthermore, the integration of GANs with cloud-based platforms and the proliferation of open-source GAN frameworks have democratized access to this technology, enabling both large enterprises and SMEs to harness its potential for innovative applications.
Another significant driver for the AI in Generative Adversarial Networks market is the surge in demand for advanced content creation tools in media, entertainment, and marketing. GANs have revolutionized the way digital content is produced by enabling hyper-realistic image and video synthesis, deepfake generation, and automated design. This has not only streamlined creative workflows but also opened new avenues for personalized content, virtual influencers, and immersive experiences in gaming and advertising. The rapid evolution of GAN architectures, such as StyleGAN and CycleGAN, has further enhanced the quality and scalability of generative models, making them indispensable for enterprises seeking to differentiate their digital offerings and engage customers more effectively in a highly competitive landscape.
The ongoing advancements in hardware acceleration and AI infrastructure have also played a pivotal role in propelling the AI in Generative Adversarial Networks market forward. The availability of powerful GPUs, TPUs, and AI-specific chips has significantly reduced the training time and computational costs associated with GANs, making them more accessible for real-time and large-scale applications. Additionally, the growing ecosystem of AI services and consulting has enabled organizations to overcome technical barriers, optimize GAN deployments, and ensure compliance with evolving regulatory standards. As investment in AI research continues to surge, the GANs market is poised for sustained innovation and broader adoption across sectors such as healthcare diagnostics, autonomous vehicles, financial modeling, and beyond.
From a regional perspective, North America continues to dominate the AI in Generative Adversarial Networks market, accounting for the largest share in 2024, driven by its robust R&D ecosystem, strong presence of leading technology companies, and early adoption of AI technologies. Europe follows closely, with significant investments in AI research and regulatory initiatives promoting ethical AI development. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digital transformation, expanding AI talent pool, and increasing government support for AI innovation. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as enterprises in these regions begin to explore the potential of GANs for industry-specific applications.
The AI in Generative Adversarial Networks market is segmented by component into software, hardware, and services, each playing a vital role in the ecosystem’s development and adoption. Software solutions constitute the largest share of the market in 2024, reflecting the growing demand for ad
Facebook
Twitter
According to our latest research, the global Generative Adversarial Networks (GANs) market size reached $2.19 billion in 2024, reflecting the rapid adoption of deep learning technologies across industries. The market is registering a robust CAGR of 31.5% and is forecasted to achieve a value of $21.47 billion by 2033. This exponential growth is attributed to the increasing demand for advanced AI-driven content creation, synthetic data generation, and the transformative impact of GANs in sectors such as healthcare, media, and cybersecurity. The expanding ecosystem of AI research and the proliferation of high-performance computing infrastructure are further accelerating the adoption of GANs worldwide.
A primary growth factor for the Generative Adversarial Networks market is the surging need for synthetic data generation. As organizations increasingly rely on data-intensive machine learning models, GANs have emerged as a pivotal technology to generate realistic, high-quality synthetic datasets that overcome data privacy and scarcity challenges. This is particularly crucial in sectors such as healthcare and finance, where access to diverse, high-fidelity data is often restricted by regulatory requirements. GANs enable the creation of anonymized yet statistically accurate datasets, facilitating model training without compromising sensitive information. Additionally, the growing sophistication of GAN architectures has led to improved output quality, making them indispensable for simulations, product development, and algorithm validation.
Another significant driver is the integration of GANs into creative and media industries for content generation. GANs are revolutionizing image and video production by automating the creation of hyper-realistic visuals, deepfakes, and special effects, reducing both time and cost. Companies in advertising, gaming, and entertainment leverage GANs to generate novel content, restore old media, and personalize user experiences at scale. With the rise of virtual influencers, digital avatars, and immersive experiences in the metaverse, GANs are becoming foundational tools for brands seeking innovative ways to engage audiences. The continuous advancements in neural network architectures and training algorithms are further enhancing the capabilities of GANs in these creative domains.
The increasing application of GANs in scientific research and drug discovery is also fueling market expansion. In the pharmaceutical industry, GANs are utilized to design new molecular structures, predict drug efficacy, and optimize clinical trials by generating synthetic patient data. This accelerates the drug development pipeline and reduces R&D costs. Similarly, in cybersecurity, GANs are deployed to simulate cyberattacks and generate adversarial examples, helping organizations bolster their defense mechanisms. The versatility of GANs in addressing complex, real-world problems across diverse sectors underscores their growing importance and widespread adoption in the coming years.
Regionally, North America continues to dominate the Generative Adversarial Networks market, driven by a robust AI research ecosystem, significant investments from tech giants, and the early adoption of advanced machine learning solutions. However, Asia Pacific is witnessing the fastest growth, propelled by increasing government initiatives, a burgeoning startup landscape, and rapid digital transformation in countries like China, Japan, and South Korea. Europe is also making significant strides, particularly in regulated industries such as healthcare and automotive, where GANs are being leveraged for innovation and compliance. The global expansion of cloud infrastructure and cross-border collaborations in AI research are further contributing to the widespread adoption and growth of the GANs market across all regions.
The Generative Adversarial Networks market is segmented by component into Software, Hardware, and Services</b&g
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
To support respiratory research, a synthetic asthma dataset was generated based on a real-world data, originally documenting 381 patients with physician-confirmed asthma who were admitted to secondary care at a single centre in 2019. The dataset is highly detailed, covering demographics, structured physiological data, medication records, and clinical outcomes. The synthetic version extends to 561 patients admitted over a year, offering insights into patient patterns, risk factors, and treatment strategies.
The dataset was created using the Synthetic Data Vault package, specifically employing the GAN synthesizer. Real data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.
Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Real world data. Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real-world data upon request.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Atrial fibrillation (AF) is a common abnormal heart rhythm that causes the heart to beat irregularly and often too fast. AF increases the risk of stroke and heart failure. AF primarily affects older adults and individuals with chronic conditions such as heart disease, high blood pressure, or obesity. Additional factors include congenital heart disease, and cardiomyopathy. AF can be treated by ablation or controlled using medication. The risk of stroke can be reduced using anti-coagulants.
This synthetic AF dataset comprises of 24.8k “patients” including demographics, co-morbidities, presenting symptoms and medical events during hospital stays, coded with ICD-10 and SNOMED-CT.
Using the Synthetic Data Vault package with a GAN synthesizer, a synthetic dataset was generated from real clinical data. The dataset includes demographic information and hospital admission details. The real data was pre-processed for correct datetime parsing and metadata was defined to capture schema structure, guiding the synthesizer in learning data distributions and relationships. The resulting synthetic dataset closely mirrors the statistical properties of the original, supporting privacy-preserving analysis and model training.
Geography: The West Midlands has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Most facial expression recognition (FER) systems rely on machine learning approaches that require large databases (DBs) for effective training. As these are not easily available, a good solution is to augment the DBs with appropriate techniques, which are typically based on either geometric transformation or deep learning based technologies (e.g., Generative Adversarial Networks (GANs)). Whereas the first category of techniques has been fairly adopted in the past, studies that use GAN-based techniques are limited for FER systems. To advance in this respect, we evaluate the impact of the GAN techniques by creating a new DB containing the generated synthetic images.
The face images contained in the KDEF DB serve as the basis for creating novel synthetic images by combining the facial features of two images (i.e., Candie Kung and Cristina Saralegui) selected from the YouTube-Faces DB. The novel images differ from each other, in particular concerning the eyes, the nose, and the mouth, whose characteristics are taken from the Candie and Cristina images.
The total number of novel synthetic images generated with the GAN is 980 (70 individuals from KDEF DB x 7 emotions x 2 subjects from YouTube-Faces DB).
The zip file "GAN_KDEF_Candie" contains the 490 images generated by combining the KDEF images with the Candie Kung image. The zip file "GAN_KDEF_Cristina" contains the 490 images generated by combining the KDEF images with the Cristina Saralegui image. The used image IDs are the same used for the KDEF DB. The synthetic generated images have a resolution of 562x762 pixels.
If you make use of this dataset, please consider citing the following publication:
Porcu, S., Floris, A., & Atzori, L. (2020). Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems. Electronics, 9, 1892, doi: 10.3390/electronics9111892, url: https://www.mdpi.com/2079-9292/9/11/1892.
BibTex format:
@article{porcu2020evaluation, title={Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems}, author={Porcu, Simone and Floris, Alessandro and Atzori, Luigi}, journal={Electronics}, volume={9}, pages={108781}, year={2020}, number = {11}, article-number = {1892}, publisher={MDPI}, doi={10.3390/electronics9111892} }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Big data is needed to implement personalized medicine, but privacy issues are a prevalent problem for collecting data and sharing them between researchers. A solution is synthetic data generated to represent real dataset carrying similar information. Here, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 12-lead 10-sec electrocardiograms (ECGs). We have developed and compare two methods, namely WaveGAN* and Pulse2Pulse GAN. We trained the GANs with 7,233 real normal ECG to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. These synthetic ECGs are fully anonymous and cannot be referred to any individual, hence they may be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using adversarial neural networks on normal ECGs from two population studies, i.e., there by solving the relevant privacy issues in medical datasets.
@article{thambawita2021deepfake,
title={DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine},
author={Thambawita, Vajira and Isaksen, Jonas L and Hicks, Steven A and Ghouse, Jonas and Ahlberg, Gustav and Linneberg, Allan and Grarup, Niels and Ellervik, Christina and Olesen, Morten Salling and Hansen, Torben and others},
journal={Scientific reports},
volume={11},
number={1},
pages={1--8},
year={2021},
publisher={Nature Publishing Group}
}
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
This highly granular synthetic dataset created as an asset for the HDR UK Medicines programme includes information on 680 cancer patients over a period of three years. Includes simulated patient-related data, such as demographics & co-morbidities extracted from ICD-10 and SNOMED-CT codes. Serial, structured data pertaining to acute care process (readmissions, survival), primary diagnosis, presenting complaint, physiology readings, blood results (infection, inflammatory markers) and acuity markers such as AVPU Scale, NEWS2 score, imaging reports, prescribed & administered treatments including fluids, blood products, procedures, information on outpatient admissions and survival outcomes following one-year post discharge.
The data was generated using a generative adversarial network model (CTGAN). A flat real data table was created by consolidating essential information from various key relational tables (medications, demographics). A synthetic version of the flat table was generated using a customized script based on the SDV package (N. Patki, 2016), that replicated the real distribution and logic relationships.
Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and provide the real-data via application.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Facebook
TwitterThis dataset contains network traffic data collected from a computer network. The network consists of various devices, such as computers, servers, and routers, interconnected to facilitate communication and data exchange. The dataset captures different types of network activities, including normal network traffic as well as various network anomalies and attacks. It provides a comprehensive view of the network behavior and can be used for studying network security, intrusion detection, and anomaly detection algorithms. The dataset includes features such as source and destination IP addresses, port numbers, protocol types, packet sizes, and timestamps, enabling detailed analysis of network traffic patterns and characteristics and so on... The second file in this dataset contains synthetic data that has been generated using a Generative Adversarial Network (GAN). GANs are a type of deep learning model that can learn the underlying patterns and distributions of a given dataset and generate new synthetic samples that resemble the original data. In this case, the GAN has been trained on the network traffic data from the first file to learn the characteristics and structure of the network traffic. The generated synthetic data in the second file aims to mimic the patterns and behavior observed in real network traffic. This synthetic data can be used for various purposes, such as augmenting the original dataset, testing the robustness of machine learning models, or exploring different scenarios in network analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cassava roots are complex structures comprising several distinct types of root. The number and size of the storage roots are two potential phenotypic traits reflecting crop yield and quality. Counting and measuring the size of cassava storage roots are usually done manually, or semi-automatically by first segmenting cassava root images. However, occlusion of both storage and fibrous roots makes the process both time-consuming and error-prone. While Convolutional Neural Nets have shown performance above the state-of-the-art in many image processing and analysis tasks, there are currently a limited number of Convolutional Neural Net-based methods for counting plant features. This is due to the limited availability of data, annotated by expert plant biologists, which represents all possible measurement outcomes. Existing works in this area either learn a direct image-to-count regressor model by regressing to a count value, or perform a count after segmenting the image. We, however, address the problem using a direct image-to-count prediction model. This is made possible by generating synthetic images, using a conditional Generative Adversarial Network (GAN), to provide training data for missing classes. We automatically form cassava storage root masks for any missing classes using existing ground-truth masks, and input them as a condition to our GAN model to generate synthetic root images. We combine the resulting synthetic images with real images to learn a direct image-to-count prediction model capable of counting the number of storage roots in real cassava images taken from a low cost aeroponic growth system. These models are used to develop a system that counts cassava storage roots in real images. Our system first predicts age group ('young' and 'old' roots; pertinent to our image capture regime) in a given image, and then, based on this prediction, selects an appropriate model to predict the number of storage roots. We achieve 91% accuracy on predicting ages of storage roots, and 86% and 71% overall percentage agreement on counting 'old' and 'young' storage roots respectively. Thus we are able to demonstrate that synthetically generated cassava root images can be used to supplement missing root classes, turning the counting problem into a direct image-to-count prediction task.
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Strokes can be ischaemic or haemorrhagic in nature, leading to debilitating symptoms which are dependent on the location of the stroke in the brain and the severity of the insult. Stroke care is centred around Hyper-acute Stroke Units (HASU), Acute Stroke and Brain Injury Units (ASU/ABIU) and specialist stroke services. Early presentation enables the use of more invasive treatments to clear blood clots, but commonly strokes present late, preventing their use.
This synthetic dataset represents approximately 29,000 stroke patients. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.
The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. Real. data was first read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.
Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute stroke services & specialist care across four hospital sites.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionAge-related macular degeneration (AMD) is one of the leading causes of vision impairment globally and early detection is crucial to prevent vision loss. However, the screening of AMD is resource dependent and demands experienced healthcare providers. Recently, deep learning (DL) systems have shown the potential for effective detection of various eye diseases from retinal fundus images, but the development of such robust systems requires a large amount of datasets, which could be limited by prevalence of the disease and privacy of patient. As in the case of AMD, the advanced phenotype is often scarce for conducting DL analysis, which may be tackled via generating synthetic images using Generative Adversarial Networks (GANs). This study aims to develop GAN-synthesized fundus photos with AMD lesions, and to assess the realness of these images with an objective scale.MethodsTo build our GAN models, a total of 125,012 fundus photos were used from a real-world non-AMD phenotypical dataset. StyleGAN2 and human-in-the-loop (HITL) method were then applied to synthesize fundus images with AMD features. To objectively assess the quality of the synthesized images, we proposed a novel realness scale based on the frequency of the broken vessels observed in the fundus photos. Four residents conducted two rounds of gradings on 300 images to distinguish real from synthetic images, based on their subjective impression and the objective scale respectively.Results and discussionThe introduction of HITL training increased the percentage of synthetic images with AMD lesions, despite the limited number of AMD images in the initial training dataset. Qualitatively, the synthesized images have been proven to be robust in that our residents had limited ability to distinguish real from synthetic ones, as evidenced by an overall accuracy of 0.66 (95% CI: 0.61–0.66) and Cohen’s kappa of 0.320. For the non-referable AMD classes (no or early AMD), the accuracy was only 0.51. With the objective scale, the overall accuracy improved to 0.72. In conclusion, GAN models built with HITL training are capable of producing realistic-looking fundus images that could fool human experts, while our objective realness scale based on broken vessels can help identifying the synthetic fundus photos.
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Type 1 Diabetes is an autoimmune disease impacting on insulin production. Type 2 Diabetes is caused by insulin resistance. Both are chronic conditions associated with serious complications such as heart disease, kidney failure, vision loss, and neuropathy. In the UK, 10% of the NHS budget is spent on managing diabetes. The demand for care is rising, with an increasing number of acute hospital admissions.
This highly granular synthetic dataset represents approximately 159,800 diabetes patients acutely admitted between 2004 and 2022. Data includes demography, socioeconomic status, co-morbidities, “time stamped” serial acuity, physiology and treatments, investigations (structured and unstructured data), hospital care processes, and outcomes.
The dataset was created using the Synthetic Data Vault (SDV) package, specifically employing the GAN synthesizer. The real data was read and pre-processed, ensuring datetime columns were correctly parsed and identifiers were handled as strings. Metadata was defined to capture the schema, specifying field types and primary keys. This metadata guided the synthesizer in understanding the structure of the data. The GAN synthesizer was then fitted to the real data, learning the distributions and dependencies within. After fitting, the synthesizer generated synthetic data that mirrors the statistical properties and relationships of the original dataset.
Geography: This synthetic dataset is based on patient data from the West Midlands. The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build different synthetic data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Facebook
TwitterThe dataset proposed is an aggregation over the already exisiting dataset for ethereum transactions fraud.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11540148%2Fd64d10dbd0ac39b7cdd3a3986c8be823%2FFLAG%20Pie%20Chart%20on%20Aggregation.png?generation=1687540862214771&alt=media" alt="Overall FLAG Distribution">
Overall Distribution of the FLAG Column
The creation of the dataset was a preliminary task in our Minor Project where our main aim was to use a Cost Sensitive based approach to assign weights to the 2 classes of transactions inorder to prioritize the reduce of the cases of missing fraudulent transactions occurring within the ecosystem, from a larger set of transactions
Cost Sensitive Learning Approach to Ethereum Transactions Fraud Detection using Machine Learning
We've considered generating new synthetic data samples from our already exisitng dataset by employing the CTGANSynthsizer Model which is very suitable for generating samples statistically representative of the original tabular dataset; thereby doubling the datset at an overall level.
Successfully created the aggregated dataset by doubling the original dataset with synthetic data samples, which have 85.63% similarity score with the existing dataset.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11540148%2F4b4ff58f17f2283604bf5e85ab80ebe3%2FLoss%20Plot.png?generation=1687539313296286&alt=media" alt="CTGAN Loss Function for the dataset">
CTGAN Loss Function for the dataset
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11540148%2F8c8ecbd8312914c7facc641b8cad406e%2FColumns%20Similarity.png?generation=1687539791786124&alt=media" alt="Columns Similarity between Synthetic & Original data">
Columns Similarity between Synthetic & Original data
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.
Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generative adversarial networks (GANs) have recently been successfully used to create realistic synthetic microscopy cell images in 2D and predict intermediate cell stages. In the current paper we highlight that GANs can not only be used for creating synthetic cell images optimized for different fluorescent molecular labels, but that by using GANs for augmentation of training data involving scaling or other transformations the inherent length scale of biological structures is retained. In addition, GANs make it possible to create synthetic cells with specific shape features, which can be used, for example, to validate different methods for feature extraction. Here, we apply GANs to create 2D distributions of fluorescent markers for F-actin in the cell cortex of Dictyostelium cells (ABD), a membrane receptor (cAR1), and a cortex-membrane linker protein (TalA). The recent more widespread use of 3D lightsheet microscopy, where obtaining sufficient training data is considerably more difficult than in 2D, creates significant demand for novel approaches to data augmentation. We show that it is possible to directly generate synthetic 3D cell images using GANs, but limitations are excessive training times, dependence on high-quality segmentations of 3D images, and that the number of z-slices cannot be freely adjusted without retraining the network. We demonstrate that in the case of molecular labels that are highly correlated with cell shape, like F-actin in our example, 2D GANs can be used efficiently to create pseudo-3D synthetic cell data from individually generated 2D slices. Because high quality segmented 2D cell data are more readily available, this is an attractive alternative to using less efficient 3D networks.