Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs
Dataset specification:
Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/
Abstract:
Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
including both sunny and cloudy days.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Synthetic OPG Image Dataset comprises high-resolution, anatomically realistic dental X-ray images generated using custom-trained GAN variants. Based on a diverse clinical dataset from Pakistan, Thailand, and the U.S., this collection includes over 1200 curated synthetic images designed to augment training data for deep learning models in dental imaging.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Protective coatings based on two dimensional materials such as graphene have gained traction for diverse applications. Their impermeability, inertness, excellent bonding with metals, and amenability to functionalization renders them as promising coatings for both abiotic and microbiologically influenced corrosion (MIC). Owing to the success of graphene coatings, the whole family of 2D materials, including hexagonal boron nitride and molybdenum disulphide are being screened to obtain other promising coatings. AI-based data-driven models can accelerate virtual screening of 2D coatings with desirable physical and chemical properties. However, lack of large experimental datasets renders training of classifiers difficult and often results in over-fitting. Generate large datasets for MIC resistance of 2D coatings is both complex and laborious. Deep learning data augmentation methods can alleviate this issue by generating synthetic electrochemical data that resembles the training data classes. Here, we investigated two different deep generative models, namely variation autoencoder (VAE) and generative adversarial network (GAN) for generating synthetic data for expanding small experimental datasets. Our model experimental system included few layered graphene over copper surfaces. The synthetic data generated using GAN displayed a greater neural network system performance (83-85% accuracy) than VAE generated synthetic data (78-80% accuracy). However, VAE data performed better (90% accuracy) than GAN data (84%-85% accuracy) when using XGBoost. Finally, we show that synthetic data based on VAE and GAN models can drive machine learning models for developing MIC resistant 2D coatings.
This dataset contains network traffic data collected from a computer network. The network consists of various devices, such as computers, servers, and routers, interconnected to facilitate communication and data exchange. The dataset captures different types of network activities, including normal network traffic as well as various network anomalies and attacks. It provides a comprehensive view of the network behavior and can be used for studying network security, intrusion detection, and anomaly detection algorithms. The dataset includes features such as source and destination IP addresses, port numbers, protocol types, packet sizes, and timestamps, enabling detailed analysis of network traffic patterns and characteristics and so on... The second file in this dataset contains synthetic data that has been generated using a Generative Adversarial Network (GAN). GANs are a type of deep learning model that can learn the underlying patterns and distributions of a given dataset and generate new synthetic samples that resemble the original data. In this case, the GAN has been trained on the network traffic data from the first file to learn the characteristics and structure of the network traffic. The generated synthetic data in the second file aims to mimic the patterns and behavior observed in real network traffic. This synthetic data can be used for various purposes, such as augmenting the original dataset, testing the robustness of machine learning models, or exploring different scenarios in network analysis.
https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
According to our latest research, the global AI in Generative Adversarial Networks (GANs) market size reached USD 2.65 billion in 2024, reflecting robust growth driven by rapid advancements in deep learning and artificial intelligence. The market is expected to register a remarkable CAGR of 31.4% from 2025 to 2033, accelerating the adoption of GANs across diverse industries. By 2033, the market is forecasted to achieve a value of USD 32.78 billion, underscoring the transformative impact of GANs in areas such as image and video generation, data augmentation, and synthetic content creation. This trajectory is supported by the increasing demand for highly realistic synthetic data and the expansion of AI-driven applications across enterprise and consumer domains.
A primary growth factor for the AI in Generative Adversarial Networks market is the exponential increase in the availability and complexity of data that organizations must process. GANs, with their unique adversarial training methodology, have proven exceptionally effective for generating realistic synthetic data, which is crucial for industries like healthcare, automotive, and finance where data privacy and scarcity are significant concerns. The ability of GANs to create high-fidelity images, videos, and even text has enabled organizations to enhance their AI models, improve data diversity, and reduce bias, thereby accelerating the adoption of AI-driven solutions. Furthermore, the integration of GANs with cloud-based platforms and the proliferation of open-source GAN frameworks have democratized access to this technology, enabling both large enterprises and SMEs to harness its potential for innovative applications.
Another significant driver for the AI in Generative Adversarial Networks market is the surge in demand for advanced content creation tools in media, entertainment, and marketing. GANs have revolutionized the way digital content is produced by enabling hyper-realistic image and video synthesis, deepfake generation, and automated design. This has not only streamlined creative workflows but also opened new avenues for personalized content, virtual influencers, and immersive experiences in gaming and advertising. The rapid evolution of GAN architectures, such as StyleGAN and CycleGAN, has further enhanced the quality and scalability of generative models, making them indispensable for enterprises seeking to differentiate their digital offerings and engage customers more effectively in a highly competitive landscape.
The ongoing advancements in hardware acceleration and AI infrastructure have also played a pivotal role in propelling the AI in Generative Adversarial Networks market forward. The availability of powerful GPUs, TPUs, and AI-specific chips has significantly reduced the training time and computational costs associated with GANs, making them more accessible for real-time and large-scale applications. Additionally, the growing ecosystem of AI services and consulting has enabled organizations to overcome technical barriers, optimize GAN deployments, and ensure compliance with evolving regulatory standards. As investment in AI research continues to surge, the GANs market is poised for sustained innovation and broader adoption across sectors such as healthcare diagnostics, autonomous vehicles, financial modeling, and beyond.
From a regional perspective, North America continues to dominate the AI in Generative Adversarial Networks market, accounting for the largest share in 2024, driven by its robust R&D ecosystem, strong presence of leading technology companies, and early adoption of AI technologies. Europe follows closely, with significant investments in AI research and regulatory initiatives promoting ethical AI development. The Asia Pacific region is emerging as a high-growth market, fueled by rapid digital transformation, expanding AI talent pool, and increasing government support for AI innovation. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as enterprises in these regions begin to explore the potential of GANs for industry-specific applications.
The AI in Generative Adversarial Networks market is segmented by component into software, hardware, and services, each playing a vital role in the ecosystem’s development and adoption. Software solutions constitute the largest share of the market in 2024, reflecting the growing demand for ad
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
All the images of faces here are generated using https://thispersondoesnotexist.com/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F4c3d3569f4f9c12fc898d76390f68dab%2FBeFunky-collage.jpg?generation=1662079836729388&alt=media" alt="">
Under US copyright law, these images are technically not subject to copyright protection. Only "original works of authorship" are considered. "To qualify as a work of 'authorship' a work must be created by a human being," according to a US Copyright Office's report [PDF].
https://www.theregister.com/2022/08/14/ai_digital_artwork_copyright/
I manually tagged all images as best as I could and separated them between the two classes below
Some may pass either female or male, but I will leave it to you to do the reviewing. I included toddlers and babies under Male/ Female
Each of the faces are totally fake, created using an algorithm called Generative Adversarial Networks (GANs).
A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss).
Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning,and reinforcement learning.
Just a simple Jupyter notebook that looped and invoked the website https://thispersondoesnotexist.com/ , saving all images locally
According to our latest research, the GAN-Synthesized Augmented Radiology Dataset market size reached USD 412 million in 2024, supported by a robust surge in the adoption of artificial intelligence across healthcare imaging. The market demonstrated a strong CAGR of 25.7% from 2021 to 2024 and is on track to reach a valuation of USD 3.2 billion by 2033. The primary growth factor fueling this expansion is the increasing demand for high-quality, diverse, and annotated radiology datasets to train and validate advanced AI diagnostic models, especially as regulatory requirements for clinical validation intensify globally.
The exponential growth of the GAN-Synthesized Augmented Radiology Dataset market is being driven by the urgent need for large-scale, diverse, and unbiased datasets in medical imaging. Traditional methods of acquiring and annotating radiological images are time-consuming, expensive, and often limited by patient privacy concerns. Generative Adversarial Networks (GANs) have emerged as a transformative technology, enabling the synthesis of high-fidelity, realistic medical images that can augment existing datasets. This not only enhances the statistical power and generalizability of AI models but also helps overcome the challenge of data imbalance, especially for rare diseases and underrepresented demographic groups. As AI-driven diagnostics become integral to clinical workflows, the reliance on GAN-augmented datasets is expected to intensify, further propelling market growth.
Another significant growth driver is the increasing collaboration between radiology departments, AI technology vendors, and academic research institutes. These partnerships are focused on developing standardized protocols for dataset generation, annotation, and validation, leveraging GANs to create synthetic images that closely mimic real-world clinical scenarios. The resulting datasets facilitate the training of AI algorithms for a wide array of applications, including disease detection, anomaly identification, and image segmentation. Additionally, the proliferation of cloud-based platforms and open-source AI frameworks has democratized access to GAN-synthesized datasets, enabling even smaller healthcare organizations and startups to participate in the AI-driven transformation of radiology.
The regulatory landscape is also evolving to support the responsible use of synthetic data in healthcare. Regulatory agencies in North America, Europe, and Asia Pacific are increasingly recognizing the value of GAN-generated datasets for algorithm validation, provided they meet stringent standards for data quality, privacy, and clinical relevance. This regulatory endorsement is encouraging more hospitals, diagnostic centers, and research institutions to adopt GAN-augmented datasets, further accelerating market expansion. Moreover, the ongoing advancements in GAN architectures, such as StyleGAN and CycleGAN, are enhancing the realism and diversity of synthesized images, making them virtually indistinguishable from real patient scans and boosting their acceptance in both clinical and research settings.
From a regional perspective, North America is currently the largest market for GAN-Synthesized Augmented Radiology Datasets, driven by substantial investments in healthcare AI, the presence of leading technology vendors, and proactive regulatory support. Europe follows closely, with a strong emphasis on data privacy and cross-border research collaborations. The Asia Pacific region is witnessing the fastest growth, fueled by rapid digital transformation in healthcare, rising investments in AI infrastructure, and increasing disease burden. Latin America and the Middle East & Africa are also emerging as promising markets, albeit at a slower pace, as healthcare systems in these regions begin to adopt AI-driven radiology solutions.
The dataset type segment of the GAN-Synthesized Augmented Radiology Dataset market is pi
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.
Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Most facial expression recognition (FER) systems rely on machine learning approaches that require large databases (DBs) for effective training. As these are not easily available, a good solution is to augment the DBs with appropriate techniques, which are typically based on either geometric transformation or deep learning based technologies (e.g., Generative Adversarial Networks (GANs)). Whereas the first category of techniques has been fairly adopted in the past, studies that use GAN-based techniques are limited for FER systems. To advance in this respect, we evaluate the impact of the GAN techniques by creating a new DB containing the generated synthetic images.
The face images contained in the KDEF DB serve as the basis for creating novel synthetic images by combining the facial features of two images (i.e., Candie Kung and Cristina Saralegui) selected from the YouTube-Faces DB. The novel images differ from each other, in particular concerning the eyes, the nose, and the mouth, whose characteristics are taken from the Candie and Cristina images.
The total number of novel synthetic images generated with the GAN is 980 (70 individuals from KDEF DB x 7 emotions x 2 subjects from YouTube-Faces DB).
The zip file "GAN_KDEF_Candie" contains the 490 images generated by combining the KDEF images with the Candie Kung image. The zip file "GAN_KDEF_Cristina" contains the 490 images generated by combining the KDEF images with the Cristina Saralegui image. The used image IDs are the same used for the KDEF DB. The synthetic generated images have a resolution of 562x762 pixels.
If you make use of this dataset, please consider citing the following publication:
Porcu, S., Floris, A., & Atzori, L. (2020). Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems. Electronics, 9, 1892, doi: 10.3390/electronics9111892, url: https://www.mdpi.com/2079-9292/9/11/1892.
BibTex format:
@article{porcu2020evaluation, title={Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems}, author={Porcu, Simone and Floris, Alessandro and Atzori, Luigi}, journal={Electronics}, volume={9}, pages={108781}, year={2020}, number = {11}, article-number = {1892}, publisher={MDPI}, doi={10.3390/electronics9111892} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cassava roots are complex structures comprising several distinct types of root. The number and size of the storage roots are two potential phenotypic traits reflecting crop yield and quality. Counting and measuring the size of cassava storage roots are usually done manually, or semi-automatically by first segmenting cassava root images. However, occlusion of both storage and fibrous roots makes the process both time-consuming and error-prone. While Convolutional Neural Nets have shown performance above the state-of-the-art in many image processing and analysis tasks, there are currently a limited number of Convolutional Neural Net-based methods for counting plant features. This is due to the limited availability of data, annotated by expert plant biologists, which represents all possible measurement outcomes. Existing works in this area either learn a direct image-to-count regressor model by regressing to a count value, or perform a count after segmenting the image. We, however, address the problem using a direct image-to-count prediction model. This is made possible by generating synthetic images, using a conditional Generative Adversarial Network (GAN), to provide training data for missing classes. We automatically form cassava storage root masks for any missing classes using existing ground-truth masks, and input them as a condition to our GAN model to generate synthetic root images. We combine the resulting synthetic images with real images to learn a direct image-to-count prediction model capable of counting the number of storage roots in real cassava images taken from a low cost aeroponic growth system. These models are used to develop a system that counts cassava storage roots in real images. Our system first predicts age group ('young' and 'old' roots; pertinent to our image capture regime) in a given image, and then, based on this prediction, selects an appropriate model to predict the number of storage roots. We achieve 91% accuracy on predicting ages of storage roots, and 86% and 71% overall percentage agreement on counting 'old' and 'young' storage roots respectively. Thus we are able to demonstrate that synthetically generated cassava root images can be used to supplement missing root classes, turning the counting problem into a direct image-to-count prediction task.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please ensure to cite the paper when utilizing the dataset in a research study. Refer to the paper link or BibTeX provided below.
This repository contains comprehensive datasets for soil classification and recognition research. The Original Dataset comprises soil images sourced from various online repositories, which have been meticulously cleaned and preprocessed to ensure data quality and consistency. To enhance the dataset's size and diversity, we employed Generative Adversarial Networks (GANs), specifically the CycleGAN architecture, to generate synthetic soil images. This augmented collection is referred to as the CyAUG Dataset. Both datasets are specifically designed to advance research in soil classification and recognition using state-of-the-art deep learning methodologies.
This dataset was curated as part of the research study titled "An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations" by Farhan Sheth, Priya Mathur, Amit Kumar Gupta, and Sandeep Chaurasia, published in Engineering Applications of Artificial Intelligence.
Application produced by this research is available at:
Note: If you are using any part of this project; dataset, code, application, then please cite the work as mentioned in the Citation section below.
Both dataset consists of images of 7 different soil types.
The Soil Classification Dataset is structured to facilitate the classification of various soil types based on images. The dataset includes images of the following soil types:
The dataset is organized into folders, each named after a specific soil type, containing images of that soil type. The images vary in resolution and quality, providing a diverse set of examples for training and testing classification models.
If you are using any of the derived dataset, please cite the following paper:
@article{SHETH2025111425,
title = {An advanced artificial intelligence framework integrating ensembled convolutional neural networks and Vision Transformers for precise soil classification with adaptive fuzzy logic-based crop recommendations},
journal = {Engineering Applications of Artificial Intelligence},
volume = {158},
pages = {111425},
year = {2025},
issn = {0952-1976},
doi = {https://doi.org/10.1016/j.engappai.2025.111425},
url = {https://www.sciencedirect.com/science/article/pii/S0952197625014277},
author = {Farhan Sheth and Priya Mathur and Amit Kumar Gupta and Sandeep Chaurasia},
keywords = {Soil classification, Crop recommendation, Vision transformers, Convolutional neural network, Transfer learning, Fuzzy logic}
}
This refers to the gan-generated dataset for the paper" CSI4Free: GAN-Augmented mmWave CSI forImproved Pose Classification. Abstract:In recent years, Joint Communication and Sensing (JC&S), has demonstrated significant success, particularly in utilizing sub-6 GHz frequencies with commercial-off-the-shelf (COTS) Wi-Fi devices for applications such as localization, gesture recognition, and pose classification. Deep learning and the existence of large public datasets has been pivotal in achieving such results. However, at mmWave frequencies (30-300 GHz), which has shown potential for more accurate sensing performance, there is a noticeable lack of research in the domain of COTS Wi-Fi sensing. Challenges such as limited research hardware, the absence of large datasets, limited functionality in COTS hardware, and the complexities of data collection present obstacles to a comprehensive exploration of this field. In this work, we aim to address these challenges by developing a method that can generate synthetic mmWave channel state information (CSI) samples. In particular, we use a generative adversarial network (GAN) on an existing dataset, to generate 30,000 additional CSI samples. The augmented samples exhibit a remarkable degree of consistency with the original data, as indicated by the notably high GAN-train and GAN-test scores. Furthermore, we integrate the augmented samples in training a pose classification model. We observe that the augmented samples complement the real data and improve the generalization of the classification model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
gender
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Monthly nighttime lights (NTL) can clearly depict an area's prevailing intra-year socio-economic dynamics. The Earth Observation Group at Colorado School of Mines provides monthly NTL products from the Day Night Band (DNB) sensor on board the Visible and Infrared Imaging Suite (VIIRS) satellite (April 2012 onwards) and from Operational Linescan System (OLS) sensor onboard the Defense Meteorological Satellite Program (DMSP) satellites (April 1992 onwards). In the current study, an attempt has been made to generate synthetic monthly VIIRS-like products of 1992-2012, using a deep learning-based image translation network. Initially, the defects of the 216 monthly DMSP images (1992-2013) were corrected to remove geometric errors, background noise, and radiometric errors. Correction on monthly VIIRS imagery to remove background noise and ephemeral lights was done using low and high thresholds. Improved DMSP and corrected VIIRS images from April 2012 - December 2013 are used in a conditional generative adversarial network (cGAN) along with Land Use Land Cover, as auxiliary input, to generate VIIRS-like imagery from 1992-2012. The modelled imagery was aggregated annually and showed an R2 of 0.94 with the results of other annual-scale VIIRS-like imagery products of India, R2 of 0.85 w.r.t GDP and R2 of 0.69 w.r.t population. Regression analysis of the generated VIIRS-like products with the actual VIIRS images for the years 2012 and 2013 over India indicated a good approximation with an R2 of 0.64 and 0.67 respectively, while the spatial density relation depicted an under-estimation of the brightness values by the model at extremely high radiance values with an R2 of 0.56 and 0.53 respectively. Qualitative analysis for also performed on both national and state scales. Visual analysis over 1992-2013 confirms a gradual increase in the brightness of the lights indicating that the cGAN model images closely represent the actual pattern followed by the nighttime lights. Finally, a synthetically generated monthly VIIRS-like product is delivered to the research community which will be useful for studying the changes in socio-economic dynamics over time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
a good solution is to augment the DBs with appropriate techniques
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generative models based on latent variables, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields. However, many data such as natural images usually do not populate the ambient Euclidean space but instead reside in a lower-dimensional manifold. Thus an inappropriate choice of the latent dimension fails to uncover the structure of the data, possibly resulting in mismatch of latent representations and poor generative qualities. Toward addressing these problems, we propose a novel framework called the latent Wasserstein GAN (LWGAN) that fuses the Wasserstein auto-encoder and the Wasserstein GAN so that the intrinsic dimension of the data manifold can be adaptively learned by a modified informative latent distribution. We prove that there exist an encoder network and a generator network in such a way that the intrinsic dimension of the learned encoding distribution is equal to the dimension of the data manifold. We theoretically establish that our estimated intrinsic dimension is a consistent estimate of the true dimension of the data manifold. Meanwhile, we provide an upper bound on the generalization error of LWGAN, implying that we force the synthetic data distribution to be similar to the real data distribution from a population perspective. Comprehensive empirical experiments verify our framework and show that LWGAN is able to identify the correct intrinsic dimension under several scenarios, and simultaneously generate high-quality synthetic data by sampling from the learned latent distribution. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
cigFacies is a dataset create by the Computational Interpretation Group (CIG) for the AI-based automatic seismic facies classification in 3-D seismic data, Hui Gao, Xinming Wu, Xiaoming Sun and Mingcai Hou are the main contributors to the dataset.
This is the benchmark skeletonization datasets of seismic facies, guided by the knowledge graph of seismic facies and constructed from three different stategies (field seismic data, synthetic data and GAN-based generation).
Below are some brief desription of the datasets:
1) The "The benchmark skeletonization datasets" file consists of 5 classes of seismic facies.
2) The "parallel_class", "clinoform_class", "fill_class", "hummocky_class" and "chaotic_class" consist of 2000, 1500, 1500, 1500, 1500 stratigraphic skeletonization data constructed from field seismic data, synthetic data and GAN-based generation, respectively.
The source codes for constructing the benchmark dataset of seismic facies and deep learning for seismic facies classification have been uploaded to Github and are freely available at cigFaciesNet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Mineral Photos dataset is a vast collection of over 39,000 images of mineral specimens across 15 distinct mineral categories. It is primarily designed for use in machine learning and computer vision tasks, particularly for mineral classification and image generation. Each category contains images of the minerals in various forms and lighting conditions, making it a comprehensive resource for training models to recognize and generate mineral images.
Key Features: Total Images: Over 39,000 high-quality photographs.
Mineral Categories: The dataset includes images from the following 15 mineral categories:
Purpose: The dataset is perfect for training models in mineral classification and image generation tasks.
Use Case: Ideal for machine learning, image recognition, and deep learning applications in the classification and generation of images related to mineral ores.
Use Cases: Mineral Classification: Building models that can automatically classify mineral ores based on image data.
Image Generation: Using the dataset for generating synthetic images of minerals, which can be useful for training data augmentation or GAN-based projects.
Computer Vision: Training deep learning models for object recognition and classification in the field of mineralogy.
This dataset offers a valuable resource for those working on image-based machine learning models related to mineral identification, image synthesis, and visual pattern recognition in mineralogy.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This repository contains a Pix2Pix deep learning model to generate synthetic PECAM-1 staining from brightfield images. The model was trained on an initial dataset of 484 paired brightfield and fluorescent microscopy images, which was computationally augmented by a factor of 6 to enhance model performance. The training was conducted over 245 epochs with a patch size of 512x512, a batch size of 1, and a vanilla GAN loss function. The final model was selected based on quality metric scores and visual comparison to ground truth images, achieving an average SSIM score of 0.273 and an LPIPS score of 0.360 on the test dataset.
Model: Pix2Pix for generating synthetic PECAM staining from brightfield images
Training Dataset:
Original Dataset: 484 paired brightfield and fluorescent microscopy images
Augmented Dataset: Expanded to 2,904 paired images through computational augmentation
Microscope: Nikon Eclipse Ti2-E, brightfield/fluorescence microscope with a 20x objective
Data Type: Brightfield and fluorescent microscopy images
File Format: TIFF (.tif), 16-bit
Image Size: 1024 x 1022 pixels (Pixel size: 650 nm)
Training Parameters:
Epochs: 245
Patch Size: 512 x 512 pixels
Batch Size: 1
Loss Function: Vanilla GAN loss function
Model Performance:
SSIM Score: 0.273
LPIPS Score: 0.360
Model Selection: The best model was chosen based on quality metric scores and visual inspection compared to ground truth images.
Model Training: Conducted using ZeroCostDL4Mic (https://github.com/HenriquesLab/ZeroCostDL4Mic/wiki)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs
Dataset specification:
Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/
Abstract:
Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.