100+ datasets found

i
DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset
ieee-dataport.org
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gueltoum Bendiab (2025). DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset [Dataset]. https://ieee-dataport.org/documents/deepguarddb-real-and-text-image-synthetic-images-dataset
Explore at:
Dataset updated
Jun 25, 2025
Authors
Gueltoum Bendiab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
privacy
f
Supplemental Synthetic Images (outdated)
figshare.com
zip
Updated May 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duke Bass Connections Deep Learning for Rare Energy Infrastructure 2020-2021 (2021). Supplemental Synthetic Images (outdated) [Dataset]. http://doi.org/10.6084/m9.figshare.13546643.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13546643.v2
Dataset updated
May 7, 2021
Dataset provided by
figshare
Authors
Duke Bass Connections Deep Learning for Rare Energy Infrastructure 2020-2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OverviewThis is a set of synthetic overhead imagery of wind turbines that was created with CityEngine. There are corresponding labels that provide the class, x and y coordinates, and height and width (YOLOv3 format) of the ground truth bounding boxes for each wind turbine in the images. These labels are named similarly to the images (e.g. image.png will have the label titled image.txt)..UseThis dataset is meant as supplementation to training an object detection model on overhead images of wind turbines. It can be added to the training set of an object detection model to potentially improve performance when using the model on real overhead images of wind turbines.WhyThis dataset was created to examine the utility of adding synthetic imagery to the training set of an object detection model to improve performance on rare objects. Since wind turbines are both very rare in number and sparse, this makes acquiring data very costly. This synthetic imagery is meant to solve this issue by automating the generation of new training data. The use of synthetic imagery can also be applied to the issue of cross-domain testing, where the model lacks training data on a particular region and consequently struggles when used on that region.MethodThe process for creating the dataset involved selecting background images from NAIP imagery available on Earth OnDemand. These images were randomlyselected from these geographies: forest, farmland, grasslands, water, urban/suburban,mountains, and deserts. No consideration was put into whether the background images would seem realistic. This is because we wanted to see if this would help the model become better at detecting wind turbines regardless of their context (which would help when using the model on novel geographies). Then, a script was used to select these at random and uniformly generate 3D models of large wind turbines over the image and then position the virtual camera to save four 608x608 pixel images. This process was repeated with the same random seed, but with no background image and the wind turbines colored as black. Next, these black and white images were converted into ground truth labels by grouping the black pixels in the images.
n
Data from: Trust, AI, and Synthetic Biometrics
curate.nd.edu
pdf
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick G Tinsley (2024). Trust, AI, and Synthetic Biometrics [Dataset]. http://doi.org/10.7274/25604631.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/25604631.v1
Dataset updated
Nov 11, 2024
Dataset provided by
University of Notre Dame
Authors
Patrick G Tinsley
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial Intelligence-based image generation has recently seen remarkable advancements, largely driven by deep learning techniques, such as Generative Adversarial Networks (GANs). With the influx and development of generative models, so too have biometric re-identification models and presentation attack detection models seen a surge in discriminative performance. However, despite the impressive photo-realism of generated samples and the additive value to the data augmentation pipeline, the role and usage of machine learning models has received intense scrutiny and criticism, especially in the context of biometrics, often being labeled as untrustworthy. Problems that have garnered attention in modern machine learning include: humans' and machines' shared inability to verify the authenticity of (biometric) data, the inadvertent leaking of private biometric data through the image synthesis process, and racial bias in facial recognition algorithms. Given the arrival of these unwanted side effects, public trust has been shaken in the blind use and ubiquity of machine learning.

However, in tandem with the advancement of generative AI, there are research efforts to re-establish trust in generative and discriminative machine learning models. Explainability methods based on aggregate model salience maps can elucidate the inner workings of a detection model, establishing trust in a post hoc manner. The CYBORG training strategy, originally proposed by Boyd, attempts to actively build trust into discriminative models by incorporating human salience into the training process.

In doing so, CYBORG-trained machine learning models behave more similar to human annotators and generalize well to unseen types of synthetic data. Work in this dissertation also attempts to renew trust in generative models by training generative models on synthetic data in order to avoid identity leakage in models trained on authentic data. In this way, the privacy of individuals whose biometric data was seen during training is not compromised through the image synthesis procedure. Future development of privacy-aware image generation techniques will hopefully achieve the same degree of biometric utility in generative models with added guarantees of trustworthiness.
D
TiCaM: Synthetic Images Dataset
datasetninja.com
Updated May 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jigyasa Katrolia; Jason Raphael Rambach; Bruno Mirbach (2021). TiCaM: Synthetic Images Dataset [Dataset]. https://datasetninja.com/ticam-synthetic-images
Explore at:
Dataset updated
May 23, 2021
Dataset provided by
Dataset Ninja
Authors
Jigyasa Katrolia; Jason Raphael Rambach; Bruno Mirbach
License
https://spdx.org/licenses/https://spdx.org/licenses/
Description
TiCaM Synthectic Images: A Time-of-Flight In-Car Cabin Monitoring Dataset is a time-of-flight dataset of car in-cabin images providing means to test extensive car cabin monitoring systems based on deep learning methods. The authors provide a synthetic image dataset of car cabin images similar to the real dataset leveraging advanced simulation software’s capability to generate abundant data with little effort. This can be used to test domain adaptation between synthetic and real data for select classes. For both datasets the authors provide ground truth annotations for 2D and 3D object detection, as well as for instance segmentation.
g
Synthetic datasets
generated.photos
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Generated Media, Inc. (2024). Synthetic datasets [Dataset]. https://generated.photos/datasets
Explore at:
Dataset updated
Jun 25, 2024
Dataset authored and provided by
Generated Media, Inc.
Description
100% synthetic. Based on model-released photos. Can be used for any purpose except for the ones violating the law. Worldwide. Different backgrounds: colored, transparent, photographic. Diversity: ethnicity, demographics, facial expressions, and poses.
e
Example (synthetic) images - Dataset - B2FIND
b2find.eudat.eu
Updated Apr 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Example (synthetic) images - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/ee28704f-2926-54b3-bf93-751d2546dc68
Explore at:
Dataset updated
Apr 24, 2024
Description
ModelA Hugging Face Unconditional image generation Diffusion Model was used for training. [1] Unconditional image generation models are not conditioned on text or images during training. They only generate images that resemble the training data distribution. The model usually starts with a seed that generates a random noise vector. The model will then use this vector to create an output image similar to the images used to train the model. The training script initializes a UNet2DModel and uses it to train the model. [2] The training loop adds noise to the images, predicts the noise residual, calculates the loss, saves checkpoints at specified steps, and saves the generated models.Training DatasetThe RANZCR CLiP dataset was used to train the model. [3] This dataset has been created by The Royal Australian and New Zealand College of Radiologists (RANZCR) which is a not-for-profit professional organisation for clinical radiologists and radiation oncologists. The dataset has been labelled with a set of definitions to ensure consistency with labelling. The normal category includes lines that were appropriately positioned and did not require repositioning. The borderline category includes lines that would ideally require some repositioning but would in most cases still function adequately in their current position. The abnormal category included lines that required immediate repositioning. 30000 images were used during training. All training images were 512x512 in size. Computational Information Training has been conducted using RTX 6000 cards with 24GB of graphics memory. A checkpoint was created after each epoch was saved with 220 checkpoints being generated so far. Each checkpoint takes up 1GB space in memory. Generating each epoch takes around 6 hours. Machine learning libraries such as TensorFlow, PyTorch, or scikit-learn are used to run the training, along with additional libraries for data preprocessing, visualization, or deployment.Referenceshttps://huggingface.co/docs/diffusers/en/training/unconditional_training#unconditional-image-generationhttps://github.com/huggingface/diffusers/blob/096f84b05f9514fae9f185cbec0a4d38fbad9919/examples/unconditional_image_generation/train_unconditional.py#L356https://www.kaggle.com/competitions/ranzcr-clip-catheter-line-classification/data
pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of...
zenodo.org
bin
Updated Jun 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie (2021). pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs [Dataset]. http://doi.org/10.5281/zenodo.5031881
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5031881
Dataset updated
Jun 26, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie; Jason Plawinski; Hanxi Sun; Sajanth Subramaniam; Amir Jamaludin; Timor Kadir; Aimee Readie; Gregory Ligozio; David Ohlssen; Thibaud Coroller; Mark Baillie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetic dataset for A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Dataset specification:

MRI images of Vertebral Units labelled based on region

Dataset is comprised of 10000 pairs of images and labels

Image and label pair number k can be selected by: synthetic_dataset['images'][k] and synthetic_dataset['regions'][k]

Images are 3D of size (9, 64, 64)

Regions are stored as an integer. Mapping is 0: cervical, 1: thoracic, 2: lumbar

Arxiv paper: https://arxiv.org/abs/2106.13199
Github code: https://github.com/tcoroller/pGAN/

Abstract:

Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.
p
Data from: PIV/BOS synthetic image generation in variable density...
purr.purdue.edu
Updated Oct 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lalit Rajendran (2020). PIV/BOS synthetic image generation in variable density environments for error analysis and experiment design [Dataset]. http://doi.org/10.4231/P45Z-8361
Explore at:
Unique identifier
https://doi.org/10.4231/P45Z-8361
Dataset updated
Oct 5, 2020
Dataset provided by
PURR
Authors
Lalit Rajendran
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Ray tracin based image generation methodology to render realistic images of particle image velocimetry (PIV) and background oriented schlieren (BOS) experiments in the presence of density/refractive index gradients.
M
Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034
scoop.market.us
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.us Scoop (2025). Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034 [Dataset]. https://scoop.market.us/synthetic-data-generation-market-news/
Explore at:
Dataset updated
Mar 18, 2025
Dataset authored and provided by
Market.us Scoop
License
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Synthetic Data Generation Market Size

As per the latest insights from Market.us, the Global Synthetic Data Generation Market is set to reach USD 6,637.98 million by 2034, expanding at a CAGR of 35.7% from 2025 to 2034. The market, valued at USD 313.50 million in 2024, is witnessing rapid growth due to rising demand for high-quality, privacy-compliant, and AI-driven data solutions.

North America dominated in 2024, securing over 35% of the market, with revenues surpassing USD 109.7 million. The regionâ€™s leadership is fueled by strong investments in artificial intelligence, machine learning, and data security across industries such as healthcare, finance, and autonomous systems. With increasing reliance on synthetic data to enhance AI model training and reduce data privacy risks, the market is poised for significant expansion in the coming years.
https://market.us/wp-content/uploads/2025/03/Synthetic-Data-Generation-Market-Size.png" alt="Synthetic Data Generation Market Size" class="wp-image-143209">
AI vs. Real: 192-Class Scene Image Dataset
kaggle.com
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Saood Sarwar (2025). AI vs. Real: 192-Class Scene Image Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadsaoodsarwar/ai-vs-real-192-class-scene-image-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Saood Sarwar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description: This dataset contains images of 192 different scene categories, with both AI-generated and real-world images for each class. It is designed for research and benchmarking in computer vision, deep learning, and AI-generated image detection.

Key Features: 📸 192 Scene Classes: Includes diverse environments like forests, cities, beaches, deserts, and more. 🤖 AI-Generated vs. Real Images: Each class contains images generated by AI models as well as real-world photographs. 🖼️ High-Quality Images: The dataset ensures a variety of resolutions and sources to improve model generalization. 🏆 Perfect for Research: Ideal for training models in AI-generated image detection, scene classification, and image authenticity verification. Potential Use Cases: 🔍 AI-generated vs. real image classification 🏙️ Scene recognition and segmentation 🖥️ Training deep learning models for synthetic image detection 📊 Analyzing AI image generation trends
v
Synthetic Data Generation Market By Offering (Solution/Platform, Services),...
verifiedmarketresearch.com
Updated Mar 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2025). Synthetic Data Generation Market By Offering (Solution/Platform, Services), Data Type (Tabular, Text, Image, Video), Application (AI/ML Training & Development, Test Data Management), & Region for 2026-2032 [Dataset]. https://www.verifiedmarketresearch.com/product/synthetic-data-generation-market/
Explore at:
Dataset updated
Mar 5, 2025
Dataset authored and provided by
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Synthetic Data Generation Market size was valued at USD 0.4 Billion in 2024 and is projected to reach USD 9.3 Billion by 2032, growing at a CAGR of 46.5 % from 2026 to 2032.

The Synthetic Data Generation Market is driven by the rising demand for AI and machine learning, where high-quality, privacy-compliant data is crucial for model training. Businesses seek synthetic data to overcome real-data limitations, ensuring security, diversity, and scalability without regulatory concerns. Industries like healthcare, finance, and autonomous vehicles increasingly adopt synthetic data to enhance AI accuracy while complying with stringent privacy laws.

Additionally, cost efficiency and faster data availability fuel market growth, reducing dependency on expensive, time-consuming real-world data collection. Advancements in generative AI, deep learning, and simulation technologies further accelerate adoption, enabling realistic synthetic datasets for robust AI model development.
🥫Tin and Steel Cans Synthetic Image Dataset
kaggle.com
Updated Aug 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marionette 👺 (2022). 🥫Tin and Steel Cans Synthetic Image Dataset [Dataset]. https://www.kaggle.com/datasets/vencerlanz09/tin-and-steel-cans-synthetic-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Marionette 👺
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Overview

This Dataset Contains Synthetic Images of Paper and Plastic Cups. The ImageClassesCombined folder contains annotated images of all classes combined. The annotations are in the COCO format. There is also a sample test_image.jpg but you could also use your own or split the data if you prefer. Foreground images are taken from free stock image sites like unsplash.com, pexels.com, and pixabay.com. Cover Photo Designed by brgfx / Freepik

Inspiration

I want to create a dataset that could be used for image classification in different settings. The dataset can be used to train a CNN model for image detection and segmentation tasks in domains like agriculture, recycling, and many more.
S
Synthetic Media Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Synthetic Media Software Report [Dataset]. https://www.datainsightsmarket.com/reports/synthetic-media-software-1409519
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jul 27, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The synthetic media software market is experiencing rapid growth, driven by increasing demand for realistic and engaging digital content across various sectors. The market, estimated at $2.5 billion in 2025, is projected to expand significantly over the forecast period (2025-2033), fueled by a Compound Annual Growth Rate (CAGR) of 25%. This robust growth is primarily attributed to advancements in artificial intelligence (AI), particularly in areas like deep learning and natural language processing, enabling the creation of increasingly sophisticated synthetic videos, images, and audio. Key drivers include the rising adoption of synthetic media in advertising and marketing, entertainment, e-learning, and virtual training simulations. The market's segmentation reflects the diverse applications of synthetic media, encompassing solutions for video generation, audio synthesis, and image creation. Companies like Synthesia, ChatGPT, and Jasper are leading the innovation, offering comprehensive platforms and specialized tools to cater to the evolving needs of businesses and individuals. The ease of use and cost-effectiveness of these platforms are further contributing to market expansion. However, challenges remain. Ethical concerns surrounding the potential misuse of synthetic media, including the creation of deepfakes and misinformation, pose a significant restraint on market growth. Furthermore, the high initial investment required for software development and maintenance, coupled with the need for specialized skills to operate these technologies effectively, present barriers to entry for smaller players. To mitigate these challenges, the industry is focusing on developing robust verification technologies and promoting responsible AI practices. Regulation and industry self-governance are also becoming increasingly crucial to ensure the ethical and responsible use of synthetic media software. Despite these hurdles, the long-term growth prospects remain positive, with continuous advancements in AI technology poised to unlock new applications and drive further market expansion in the coming years.
h
furniture-synthetic-dataset
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Filip Nowicki (2025). furniture-synthetic-dataset [Dataset]. https://huggingface.co/datasets/filnow/furniture-synthetic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2025
Authors
Filip Nowicki
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Furniture Synthetic Dataset

A curated dataset of furniture images in four categories, with detailed attribute annotations. Designed for training and evaluating small Vision Language Models (VLMs) in extracting structured information from furniture images.

Dataset Description Overview

Total Images: 10,000 Training Set: 9,000 images (generated) Test Set: 1,000 images (real photographs) Image Generation: Stable Diffusion Medium 3.5 Test Set Annotation: Qwen2 VL… See the full description on the dataset page: https://huggingface.co/datasets/filnow/furniture-synthetic-dataset.
T
Text-to-Image Generator Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Text-to-Image Generator Report [Dataset]. https://www.datainsightsmarket.com/reports/text-to-image-generator-1974734
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
May 18, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The text-to-image generation market is experiencing explosive growth, driven by advancements in artificial intelligence, particularly deep learning models like diffusion models and GANs. The market, currently valued at an estimated $2 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 35% from 2025 to 2033, reaching an estimated $20 billion by 2033. This rapid expansion is fueled by increasing adoption across diverse sectors. The advertising industry leverages text-to-image generators for creating unique and engaging visuals for campaigns, while the art community explores new creative avenues. Other applications include game development, e-commerce product visualization, and educational content creation. Mobile terminal applications are currently leading the market share, given the widespread accessibility of smartphones, but PC terminal usage is expected to experience significant growth as computational power continues to increase, making advanced image generation accessible to a wider user base. Key players like OpenAI, Google, and Stability AI are spearheading innovation, driving competitive landscape dynamics and fostering rapid technological advancement. However, several restraints affect market growth. High computational costs associated with training and running sophisticated models can limit accessibility for smaller companies and individuals. Ethical concerns surrounding copyright infringement and potential misuse of the technology, such as generating deepfakes or biased imagery, also pose challenges. Furthermore, ensuring data privacy and addressing potential biases within training datasets remain critical considerations. Segment-wise growth varies, with the advertising segment currently dominating due to its significant budget allocation towards innovative marketing strategies. Regional growth is primarily driven by North America and Europe, which house major technology hubs and possess high early adoption rates. However, significant growth is expected in the Asia-Pacific region, fueled by burgeoning technological advancements and rising internet penetration. Overcoming these restraints and addressing ethical considerations will be crucial for sustained and responsible growth in the text-to-image generation market.
FLUXSynID: A Synthetic Face Dataset with Document and Live Images
data.europa.eu
unknown
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). FLUXSynID: A Synthetic Face Dataset with Document and Live Images [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-15172770?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
May 9, 2025
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
FLUXSynID: A Synthetic Face Dataset with Document and Live Images FLUXSynID is a high-resolution synthetic identity dataset containing 14,889 unique synthetic identities, each represented through a document-style image and three live capture variants. Identities are generated using the FLUX.1 [dev] diffusion model, guided by user-defined identity attributes such as gender, age, region of origin, and other various identity features. The dataset is created to support biometric research, including face recognition and morphing attack detection. File Structure Each identity has a dedicated folder (named as a 12-digit hex string, e.g., 000e23cdce23) containing the following 5 files: 000e23cdce23_f.json — metadata including sampled identity attributes, prompt, generation seed, etc. (_f = female; _m = male; _nb = non-binary) 000e23cdce23_f_doc.png — document-style frontal image 000e23cdce23_f_live_0_e_d1.jpg — live image generated with LivePortrait (_e = expression and pose) 000e23cdce23_f_live_0_a_d1.jpg — live image via Arc2Face (_a = arc2face) 000e23cdce23_f_live_0_p_d1.jpg — live image via PuLID (_p = pulid) All document and LivePortrait/PuLID images are 1024×1024. Arc2Face images are 512×512 due to original model constraints. Attribute Sampling and Prompting The attributes/ directory contains all information about how identity attributes were sampled: A set of .txt files (e.g., ages.txt, eye_shape.txt, body_type.txt) — each lists the possible values for one attribute class, along with their respective sampling probabilities. file_probabilities.json — defines the inclusion probability for each attribute class (i.e., how likely a class such as "eye shape" is to be included in a given prompt). attribute_clashes.json — specifies rules for resolving semantically conflicting attributes. Each clash defines a primary attribute (to be kept) and secondary attributes (to be discarded when the clash occurs). Prompts are generated automatically using Qwen2.5 large language model, based on selected attributes, and used to condition FLUX.1 [dev] during image generation. Live Image Generation Each synthetic identity has three live image-style variants: LivePortrait: expression/pose changes via keypoint-based retargeting Arc2Face: natural variation using identity embeddings (no prompt required) PuLID: identity-aware generation using prompt, embedding, and edge-conditioning with a customized FLUX.1 [dev] diffusion model These approaches provide both controlled and naturalistic identity-consistent variation. Filtering and Quality Control Included are 9 supplementary text files listing filtered subsets of identities. For instance, file similarity_filtering_adaface_thr_0.333987832069397_fmr_0.0001.txt contains identities retained after filtering out overly similar faces using AdaFace FRS under the specified threshold and false match rate (FMR). Usage and Licensing This dataset is licensed under the Creative Commons Attribution Non Commercial 4.0 International (CC BY-NC 4.0) license.You are free to use, share, and adapt the dataset for non-commercial purposes, provided that appropriate credit is given. The images in this dataset were generated using the FLUX.1 [dev] model by Black Forest Labs, which is made available under their Non-Commercial License. While this dataset does not include or distribute the model or its weights, the images were produced using that model. Users are responsible for ensuring that their use of the images complies with the FLUX.1 [dev] license, including any restrictions it imposes. Acknowledgments The FLUXSynID dataset was developed under the EINSTEIN project. The EINSTEIN project is funded by the European Union (EU) under G.A. no. 101121280 and UKRI Funding Service under IFS reference 10093453. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect the views of the EU/Executive Agency or UKRI. Neither the EU nor the granting authority nor UKRI can be held responsible for them.
u
Unimelb Corridor Synthetic dataset
figshare.unimelb.edu.au
png
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Debaditya Acharya; KOUROSH KHOSHELHAM; STEPHAN WINTER (2023). Unimelb Corridor Synthetic dataset [Dataset]. http://doi.org/10.26188/5dd8b8085b191
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.26188/5dd8b8085b191
Dataset updated
May 30, 2023
Dataset provided by
The University of Melbourne
Authors
Debaditya Acharya; KOUROSH KHOSHELHAM; STEPHAN WINTER
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data-set is a supplementary material related to the generation of synthetic images of a corridor in the University of Melbourne, Australia from a building information model (BIM). This data-set was generated to check the ability of deep learning algorithms to learn task of indoor localisation from synthetic images, when being tested on real images. =============================================================================The following is the name convention used for the data-sets. The brackets show the number of images in the data-set.REAL DATAReal
---------------------> Real images (949 images)

Gradmag-Real -------> Gradmag of real data (949 images)SYNTHETIC DATASyn-Car
----------------> Cartoonish images (2500 images)

Syn-pho-real ----------> Synthetic photo-realistic images (2500 images)

Syn-pho-real-tex -----> Synthetic photo-realistic textured (2500 images)

Syn-Edge --------------> Edge render images (2500 images)

Gradmag-Syn-Car ---> Gradmag of Cartoonish images (2500 images)=============================================================================Each folder contains the images and their respective groundtruth poses in the following format [ImageName X Y Z w p q r].To generate the synthetic data-set, we define a trajectory in the 3D indoor model. The points in the trajectory serve as the ground truth poses of the synthetic images. The height of the trajectory was kept in the range of 1.5–1.8 m from the floor, which is the usual height of holding a camera in hand. Artificial point light sources were placed to illuminate the corridor (except for Edge render images). The length of the trajectory was approximately 30 m. A virtual camera was moved along the trajectory to render four different sets of synthetic images in Blender*. The intrinsic parameters of the virtual camera were kept identical to the real camera (VGA resolution, focal length of 3.5 mm, no distortion modeled). We have rendered images along the trajectory at 0.05 m interval and ± 10° tilt.The main difference between the cartoonish (Syn-car) and photo-realistic images (Syn-pho-real) is the model of rendering. Photo-realistic rendering is a physics-based model that traces the path of light rays in the scene, which is similar to the real world, whereas the cartoonish rendering roughly traces the path of light rays. The photorealistic textured images (Syn-pho-real-tex) were rendered by adding repeating synthetic textures to the 3D indoor model, such as the textures of brick, carpet and wooden ceiling. The realism of the photo-realistic rendering comes at the cost of rendering times. However, the rendering times of the photo-realistic data-sets were considerably reduced with the help of a GPU. Note that the naming convention used for the data-sets (e.g. Cartoonish) is according to Blender terminology.An additional data-set (Gradmag-Syn-car) was derived from the cartoonish images by taking the edge gradient magnitude of the images and suppressing weak edges below a threshold. The edge rendered images (Syn-edge) were generated by rendering only the edges of the 3D indoor model, without taking into account the lighting conditions. This data-set is similar to the Gradmag-Syn-car data-set, however, does not contain the effect of illumination of the scene, such as reflections and shadows.*Blender is an open-source 3D computer graphics software and finds its applications in video games, animated films, simulation and visual art. For more information please visit: http://www.blender.orgPlease cite the papers if you use the data-set:1) Acharya, D., Khoshelham, K., and Winter, S., 2019. BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing. 150: 245-258.2) Acharya, D., Singha Roy, S., Khoshelham, K. and Winter, S. 2019. Modelling uncertainty of single image indoor localisation using a 3D model and deep learning. In ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, IV-2/W5, pages 247-254.

Stable Diffusion generated images - AIS-4SD dataset

zenodo.org

zip

Updated Apr 9, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2025). Stable Diffusion generated images - AIS-4SD dataset [Dataset]. http://doi.org/10.5281/zenodo.15131117

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15131117

Dataset updated

Apr 9, 2025

Dataset provided by

Zenodohttp://zenodo.org/

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Time period covered

Feb 3, 2025

Description

AIS-4SD

AIS-4SD (AI Summit - 4 Stable Diffusion models) is a collection of 4.000 images, generated using a set of Stability AI text-to-image diffusion models

Context

This dataset was developed during the development of a collaborative project between PEReN and VIGINUM for the AI Summit held in Paris in February 2025. This open-source project aims at assessing generated images detectors performances and their robustness to different models and transformations. The code is free and open source, and contributions to connect additional detectors are also welcome.

Official repository: https://code.peren.gouv.fr/open-source/ai-action-summit/generated-image-detection.

Dataset summary

This dataset can be used to assess detection models performances, and in particular their robustness to successive updates of the generation model.

Dataset description

1.000 generated images with four different versions of stability AI text-to-image diffusion model.

For each models, we generated:

500 portraits (👨) using SFHQ-T2I "random" prompts for faces (see Github repo, and dataset on Kaggle),
500 more general content images (🖼️) using captions of Google's Conceptual Captions dataset.

Model	Number of images
stabilityai/stable-diffusion-xl-base-1.0	500 👨 + 500 🖼️
stabilityai/stable-diffusion-2-1	500 👨 + 500 🖼️
stabilityai/stable-diffusion-3-medium-diffusers	500 👨 + 500 🖼️
stabilityai/stable-diffusion-3.5-large	500 👨 + 500 🖼️

Reproducibility

The scripts used to generated these images can be found on our open-source repository (see this specific file). After setting-up our project, you can run:

$ poetry run python scripts/generate_images.py

With minor updates to these scripts you can enrich this dataset with your specific needs.

Dataset structure

One zip file with the following structure, each directory containing the associated 500 images:

AIS-4SD/
├── generation_metadata.csv
├── StableDiffusion-2.1-faces-20250203-1448
├── StableDiffusion-2.1-other-20250203-1548
├── StableDiffusion-3.5-faces-20250203-1012
├── StableDiffusion-3.5-other-20250203-1603
├── StableDiffusion-3-faces-20250203-1545
├── StableDiffusion-3-other-20250203-1433
├── StableDiffusion-XL-faces-20250203-0924
└── StableDiffusion-XL-other-20250203-1727

The metadata for generated images (see generation_metadata.csv) are:

model: model used for generation,
prompt: prompt used for generation (ie Conceptual Captions caption or sfhqt2i prompt, with some minor prompt engineering),
guidance_scale: guidance scale of diffusion process,
num_inference_steps: number of inference steps of diffusion process,
generated_img_relative_path: relative path to image in zip structure.

Project status

Project is under ongoing development. A preliminary blog post can be found here: https://www.peren.gouv.fr/en/perenlab/2025-02-11_ai_summit/.

T
Text-to-Image Generator Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Text-to-Image Generator Report [Dataset]. https://www.archivemarketresearch.com/reports/text-to-image-generator-53813
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The text-to-image generator market is experiencing explosive growth, driven by advancements in artificial intelligence, particularly in deep learning and diffusion models. The market, estimated at $2 billion in 2025, is projected to witness a robust Compound Annual Growth Rate (CAGR) of 35% from 2025 to 2033. This significant expansion is fueled by increasing adoption across diverse sectors, including advertising, art creation, and various other applications. The accessibility of powerful generative models through cloud-based platforms and APIs is lowering the barrier to entry for both individual artists and large corporations, fostering innovation and wider market penetration. Key players like OpenAI, Google, and Stability AI are at the forefront of this revolution, constantly releasing improved models and expanding their service offerings. The market is further segmented by application type (advertising, art, and others) and terminal type (mobile and PC). While the initial adoption is heavily skewed toward North America and Europe, rapid growth is anticipated in regions like Asia Pacific and the Middle East & Africa as awareness and internet penetration increase. The restraints to market growth primarily involve concerns around ethical implications, such as potential misuse for creating deepfakes or copyright infringement issues. However, ongoing developments in watermarking technologies and responsible AI practices are actively addressing these challenges. The future of the market hinges on further technological advancements, including improving the realism and controllability of generated images, expanding the range of supported styles and applications, and successfully navigating the legal and ethical complexities inherent in this rapidly evolving technology. This rapid expansion suggests significant investment opportunities, particularly in research and development, platform development, and the provision of related services and tools. The market is expected to mature over the next decade, but maintaining its impressive growth trajectory requires continuous innovation and responsible development.
a
Quality Elo by Text To Image Model
artificialanalysis.ai
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artificial Analysis (2024). Quality Elo by Text To Image Model [Dataset]. https://artificialanalysis.ai/text-to-image
Explore at:
Dataset updated
May 16, 2024
Dataset authored and provided by
Artificial Analysis
Description
Comparison of ELO score in Artificial Analysis Image Arena (relative metric of image generation quality), Higher is better by Model

Facebook

Twitter

Click to copy link

Link copied

Cite

Gueltoum Bendiab (2025). DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset [Dataset]. https://ieee-dataport.org/documents/deepguarddb-real-and-text-image-synthetic-images-dataset

DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset

Explore at:

Dataset updated

Jun 25, 2025

Authors

Gueltoum Bendiab

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

privacy

Clear search

Close search

Google apps

Main menu

DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset

Supplemental Synthetic Images (outdated)

Data from: Trust, AI, and Synthetic Biometrics

TiCaM: Synthetic Images Dataset

Synthetic datasets

Example (synthetic) images - Dataset - B2FIND

pGAN Synthetic Dataset: A Deep Learning Approach to Private Data Sharing of...

Data from: PIV/BOS synthetic image generation in variable density...

Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034

Synthetic Data Generation Market Size

AI vs. Real: 192-Class Scene Image Dataset

Synthetic Data Generation Market By Offering (Solution/Platform, Services),...

🥫Tin and Steel Cans Synthetic Image Dataset

Overview

Inspiration

Synthetic Media Software Report

furniture-synthetic-dataset

Text-to-Image Generator Report

FLUXSynID: A Synthetic Face Dataset with Document and Live Images

Unimelb Corridor Synthetic dataset

Stable Diffusion generated images - AIS-4SD dataset

AIS-4SD

Context

Dataset summary

Dataset description

Reproducibility

Dataset structure

Project status

Text-to-Image Generator Report

Quality Elo by Text To Image Model

DeepGuardDB: Real and Text-to-Image Synthetic Images Dataset