72 datasets found
  1. D

    TiCaM: Synthetic Images Dataset

    • datasetninja.com
    Updated May 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jigyasa Katrolia; Jason Raphael Rambach; Bruno Mirbach (2021). TiCaM: Synthetic Images Dataset [Dataset]. https://datasetninja.com/ticam-synthetic-images
    Explore at:
    Dataset updated
    May 23, 2021
    Dataset provided by
    Dataset Ninja
    Authors
    Jigyasa Katrolia; Jason Raphael Rambach; Bruno Mirbach
    License

    https://spdx.org/licenses/https://spdx.org/licenses/

    Description

    TiCaM Synthectic Images: A Time-of-Flight In-Car Cabin Monitoring Dataset is a time-of-flight dataset of car in-cabin images providing means to test extensive car cabin monitoring systems based on deep learning methods. The authors provide a synthetic image dataset of car cabin images similar to the real dataset leveraging advanced simulation software’s capability to generate abundant data with little effort. This can be used to test domain adaptation between synthetic and real data for select classes. For both datasets the authors provide ground truth annotations for 2D and 3D object detection, as well as for instance segmentation.

  2. M

    Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034

    • scoop.market.us
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us Scoop (2025). Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034 [Dataset]. https://scoop.market.us/synthetic-data-generation-market-news/
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Market.us Scoop
    License

    https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Synthetic Data Generation Market Size

    As per the latest insights from Market.us, the Global Synthetic Data Generation Market is set to reach USD 6,637.98 million by 2034, expanding at a CAGR of 35.7% from 2025 to 2034. The market, valued at USD 313.50 million in 2024, is witnessing rapid growth due to rising demand for high-quality, privacy-compliant, and AI-driven data solutions.

    North America dominated in 2024, securing over 35% of the market, with revenues surpassing USD 109.7 million. The region’s leadership is fueled by strong investments in artificial intelligence, machine learning, and data security across industries such as healthcare, finance, and autonomous systems. With increasing reliance on synthetic data to enhance AI model training and reduce data privacy risks, the market is poised for significant expansion in the coming years.

    https://market.us/wp-content/uploads/2025/03/Synthetic-Data-Generation-Market-Size.png" alt="Synthetic Data Generation Market Size" class="wp-image-143209">
  3. Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035...

    • rootsanalysis.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roots Analysis (2024). Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/synthetic-data-generation-market
    Explore at:
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Authors
    Roots Analysis
    License

    https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

    Time period covered
    2021 - 2031
    Area covered
    Global
    Description

    The global synthetic data market size is projected to grow from USD 0.4 billion in the current year to USD 19.22 billion by 2035, representing a CAGR of 42.14%, during the forecast period till 2035

  4. d

    Data from: Generation of synthetic whole-slide image tiles of tumours from...

    • search-dev.test.dataone.org
    • search.dataone.org
    • +2more
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Carrillo-Perez; Marija Pizurica; Yuanning Zheng; Tarak Nath Nandi; Ravi Madduri; Jeanne Shen; Olivier Gevaert (2024). Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models [Dataset]. http://doi.org/10.5061/dryad.6djh9w174
    Explore at:
    Dataset updated
    Apr 12, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Francisco Carrillo-Perez; Marija Pizurica; Yuanning Zheng; Tarak Nath Nandi; Ravi Madduri; Jeanne Shen; Olivier Gevaert
    Time period covered
    Jan 1, 2023
    Description

    Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single-modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient’s gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascad..., , , # RNA-CDM Generated One Million Synthetic Images

    https://doi.org/10.5061/dryad.6djh9w174

    One million synthetic digital pathology images were generated using the RNA-CDM model presented in the paper "RNA-to-image multi-cancer synthesis using cascaded diffusion models".

    Description of the data and file structure

    There are ten different h5 files per cancer type (TCGA-CESC, TCGA-COAD, TCGA-KIRP, TCGA-GBM, TCGA-LUAD). Each h5 file contains 20.000 images. The key is the tile number, ranging from 0-20,000 in the first file, and from 180,000-200,000 in the last file. The tiles are saved as numpy arrays.

    Code/Software

    The code used to generate this data is available under academic license in https://rna-cdm.stanford.edu .

    Manuscript citation

    Carrillo-Perez, F., Pizurica, M., Zheng, Y. et al. Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models...

  5. g

    Synset Boulevard: Synthetic image dataset for Vehicle Make and Model...

    • gimi9.com
    Updated Dec 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Synset Boulevard: Synthetic image dataset for Vehicle Make and Model Recognition (VMMR) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_725679870677258240
    Explore at:
    Dataset updated
    Dec 15, 2024
    Description

    The Synset Boulevard dataset contains a total of 259,200 synthetically generated images of cars from a frontal traffic camera perspective, annotated by vehicle makes, models and years of construction for machine learning methods (ML) in the scope (task) of vehicle make and model recognition (VMMR). The data set contains 162 vehicle models from 43 brands with 200 images each, as well as 8 sub-data sets each to be able to investigate different imaging qualities. In addition to the classification annotations, the data set also contains label images for semantic segmentation, as well as information on image and scene properties, as well as vehicle color. The dataset was presented in May 2024 by Anne Sielemann, Stefan Wolf, Masoud Roschani, Jens Ziehn and JĂĽrgen Beyerer in the publication: Sielemann, A., Wolf, S., Roschani, M., Ziehn, J. and Beyerer, J. (2024). Synset Boulevard: A Synthetic Image Dataset for VMMR. In 2024 IEEE International Conference on Robotics and Automation (ICRA). The model information is based on information from the ADAC online database (www.adac.de/rund-ums-fahrzeug/autokatalog/marken-modelle). The data was generated using the simulation environment OCTANE (www.octane.org), which uses the Cycles ray tracer of the Blender project. The dataset's website provides detailed information on the generation process and model assumptions. The dataset is therefore also intended to be used for the suitability analysis of simulated, synthetic datasets. The data set was developed as part of the Fraunhofer PREPARE program in the "ML4Safety" project with the funding code PREPARE 40-02702, as well as funded by the "Invest BW" funding program of the Ministry of Economic Affairs, Labour and Tourism as part of the "FeinSyn" research project.

  6. E

    Synthetic Data Generation Market Size, Share, Trend Analysis by 2033

    • emergenresearch.com
    pdf
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emergen Research (2024). Synthetic Data Generation Market Size, Share, Trend Analysis by 2033 [Dataset]. https://www.emergenresearch.com/industry-report/synthetic-data-generation-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 8, 2024
    Dataset authored and provided by
    Emergen Research
    License

    https://www.emergenresearch.com/purpose-of-privacy-policyhttps://www.emergenresearch.com/purpose-of-privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    The Synthetic Data Generation Market size is expected to reach a valuation of USD 36.09 Billion in 2033 growing at a CAGR of 39.45%. The research report classifies market by share, trend, demand and based on segmentation by Data Type, Modeling Type, Offering, Application, End Use and Regional Outlook.

  7. Z

    Synthetic river flow videos dataset

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Aug 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Bodart (2022). Synthetic river flow videos dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6257391
    Explore at:
    Dataset updated
    Aug 3, 2022
    Dataset provided by
    Guillaume Bodart
    Magali Jodeau
    Jérôme Le Coz
    Alexandre Hauet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    ######### ########## ###### #########

    Synthetic river flow videos for evaluating image-based velocimetry methods

    ######### ########## ###### #########

    Year : 2022

    Authors : G.Bodart (guillaume.bodart@inrae.fr), J.Le Coz (jerome.lecoz@inrae.fr), M.Jodeau (magali.jodeau@edf.fr), A.Hauet (alexandre.hauet@edf.fr)

    ### This file describes the data attached to the article

    -> 00_article_cases

    This folder contains the data used in the case studies: synthetic videos + reference files.

    - 00_reference_velocities
      -> Reference velocities interpolated on a regular grid. Data are given in conventionnal units, i.e. m/s and m.
    
    
    - 01_XX
      -> Data of the first case study
    
    
    - 02_XX
      -> Data of the second case study
    

    -> 01_dev

    This folder contains the Python libraries and Mantaflow modified source code used in the paper. The libraries are provided as is. Feel free to contact us for support or guidelines.

    - lspiv
      -> Python library used to extract, process and display results of LSPIV analysis carried out with Fudaa-LSPIV
    
    
    - mantaflow-modified
      -> Modified version of Mantaflow described in the article. Installation instructions can be found at http://mantaflow.com
    
    
    - syri
      -> Python library used to extract, process and display fluid simulations carried out on Mantaflow and Blender. (Require the lspiv library)
    

    -> 02_dataset

    This folder contains synthetic videos generated with the method described in the article. The fluid simulation parameters, and thus the reference velocities, are the same as those presented in the article.

    • The videos can be used freely. Please consider citing the corresponding paper.
  8. Raw Synthetic Particle Image Dataset (RSPID)

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michel Machado; Michel Machado; Douglas Rocha; Douglas Rocha (2023). Raw Synthetic Particle Image Dataset (RSPID) [Dataset]. http://doi.org/10.5281/zenodo.7832205
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michel Machado; Michel Machado; Douglas Rocha; Douglas Rocha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic Particle Image Velocimetry (PIV) data generated by PIV Image Generator Software. Which is a tool that generates synthetic Particle Imaging Velocimetry (PIV) images with the purpose of validating and benchmarking PIV and Optical Flow methods in tracer based imaging for fluid mechanics (Mendes et al., 2020).

    This data was generated with the following parameters:

    • image width: 665 pixels;
    • image height: 630 pixels;
    • bit depth: 8 bits;
    • particle radius: 1, 2, 3, 4 pixels;
    • particle density: 15, 17, 20, 23, 25, 32 particles;
    • delta x factor: 0.05, 0.1, 0.15, 0.2, 0.25 %;
    • noise level: 1, 5, 10, 15;
    • out-of-plane standard deviation: 0.01, 0.025, 0.05;
    • flows: rankine uniform, rankine vortex, parabolic, stagnation, shear, decaying vortex.

  9. Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

    • zenodo.org
    • data.niaid.nih.gov
    xz
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
    Explore at:
    xzAvailable download formats
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

    Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

  10. Z

    2D high-resolution synthetic MR images of Alzheimer's patients and healthy...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diciotti, Stefano (2023). 2D high-resolution synthetic MR images of Alzheimer's patients and healthy subjects using PACGAN [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8276785
    Explore at:
    Dataset updated
    Dec 13, 2023
    Dataset provided by
    Lai, Matteo
    Marzi, Chiara
    Diciotti, Stefano
    Citi, Luca
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset encompasses a NIfTI file containing a collection of 500 images, each capturing the central axial slice of a synthetic brain MRI.

    Accompanying this file is a CSV dataset that serves as a repository for the corresponding labels linked to each image:

    Label 0: Healthy Controls (HC)

    Label 1: Alzheimer's Disease (AD)

    Each image within this dataset has been generated by PACGAN (Progressive Auxiliary Classifier Generative Adversarial Network), a framework designed and implemented by the AI for Medicine Research Group at the University of Bologna.

    PACGAN is a generative adversarial network trained to generate high-resolution images belonging to different classes. In our work, we trained this framework on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, which contains brain MRI images of AD patients and HC.

    The implementation of the training algorithm can be found within our GitHub repository, with Docker containerization.

    For further exploration, the pre-trained models are available within the Code Ocean capsule. These models can facilitate the generation of synthetic images for both classes and also aid in classifying new brain MRI images.

  11. Z

    GAN-based Synthetic VIIRS-like Image Generation over India

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. K. Srivastav (2023). GAN-based Synthetic VIIRS-like Image Generation over India [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7854533
    Explore at:
    Dataset updated
    May 25, 2023
    Dataset provided by
    S. K. Srivastav
    Prasun Kumar Gupta
    Mehak Jindal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Monthly nighttime lights (NTL) can clearly depict an area's prevailing intra-year socio-economic dynamics. The Earth Observation Group at Colorado School of Mines provides monthly NTL products from the Day Night Band (DNB) sensor on board the Visible and Infrared Imaging Suite (VIIRS) satellite (April 2012 onwards) and from Operational Linescan System (OLS) sensor onboard the Defense Meteorological Satellite Program (DMSP) satellites (April 1992 onwards). In the current study, an attempt has been made to generate synthetic monthly VIIRS-like products of 1992-2012, using a deep learning-based image translation network. Initially, the defects of the 216 monthly DMSP images (1992-2013) were corrected to remove geometric errors, background noise, and radiometric errors. Correction on monthly VIIRS imagery to remove background noise and ephemeral lights was done using low and high thresholds. Improved DMSP and corrected VIIRS images from April 2012 - December 2013 are used in a conditional generative adversarial network (cGAN) along with Land Use Land Cover, as auxiliary input, to generate VIIRS-like imagery from 1992-2012. The modelled imagery was aggregated annually and showed an R2 of 0.94 with the results of other annual-scale VIIRS-like imagery products of India, R2 of 0.85 w.r.t GDP and R2 of 0.69 w.r.t population. Regression analysis of the generated VIIRS-like products with the actual VIIRS images for the years 2012 and 2013 over India indicated a good approximation with an R2 of 0.64 and 0.67 respectively, while the spatial density relation depicted an under-estimation of the brightness values by the model at extremely high radiance values with an R2 of 0.56 and 0.53 respectively. Qualitative analysis for also performed on both national and state scales. Visual analysis over 1992-2013 confirms a gradual increase in the brightness of the lights indicating that the cGAN model images closely represent the actual pattern followed by the nighttime lights. Finally, a synthetically generated monthly VIIRS-like product is delivered to the research community which will be useful for studying the changes in socio-economic dynamics over time.

  12. h

    Iris_Database

    • huggingface.co
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iris_Database [Dataset]. https://huggingface.co/datasets/fatdove/Iris_Database
    Explore at:
    Dataset updated
    Mar 18, 2025
    Authors
    fatdove
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Synthetic Iris Image Dataset

      Overview
    

    This repository contains a dataset of synthetic colored iris images generated using diffusion models based on our paper "Synthetic Iris Image Generation Using Diffusion Networks." The dataset comprises 17,695 high-quality synthetic iris images designed to be biometrically unique from the training data while maintaining realistic iris pigmentation distributions. In this repository we contain about 10000 filtered iris images with the… See the full description on the dataset page: https://huggingface.co/datasets/fatdove/Iris_Database.

  13. w

    Global Synthetic Data Tool Market Research Report: By Type (Image...

    • wiseguyreports.com
    Updated Aug 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Synthetic Data Tool Market Research Report: By Type (Image Generation, Text Generation, Audio Generation, Time-Series Generation, User-Generated Data Marketplace), By Application (Computer Vision, Natural Language Processing, Predictive Analytics, Healthcare, Retail), By Deployment Mode (Cloud-Based, On-Premise), By Organization Size (Small and Medium Enterprises (SMEs), Large Enterprises) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/synthetic-data-tool-market
    Explore at:
    Dataset updated
    Aug 10, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 8, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20237.98(USD Billion)
    MARKET SIZE 20249.55(USD Billion)
    MARKET SIZE 203240.0(USD Billion)
    SEGMENTS COVEREDType ,Application ,Deployment Mode ,Organization Size ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSGrowing Demand for Data Privacy and Security Advancement in Artificial Intelligence AI and Machine Learning ML Increasing Need for Faster and More Efficient Data Generation Growing Adoption of Synthetic Data in Various Industries Government Regulations and Compliance
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDMostlyAI ,Gretel.ai ,H2O.ai ,Scale AI ,UNchart ,Anomali ,Replica ,Big Syntho ,Owkin ,DataGenix ,Synthesized ,Verisart ,Datumize ,Deci ,Datasaur
    MARKET FORECAST PERIOD2025 - 2032
    KEY MARKET OPPORTUNITIESData privacy compliance Improved data availability Enhanced data quality Reduced data bias Costeffective
    COMPOUND ANNUAL GROWTH RATE (CAGR) 19.61% (2025 - 2032)
  14. R

    Synthetic Fruit Object Detection Dataset - raw

    • public.roboflow.com
    zip
    Updated Aug 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Dwyer (2021). Synthetic Fruit Object Detection Dataset - raw [Dataset]. https://public.roboflow.com/object-detection/synthetic-fruit/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 11, 2021
    Dataset authored and provided by
    Brad Dwyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of Fruits
    Description

    About this dataset

    This dataset contains 6,000 example images generated with the process described in Roboflow's How to Create a Synthetic Dataset tutorial.

    The images are composed of a background (randomly selected from Google's Open Images dataset) and a number of fruits (from Horea94's Fruit Classification Dataset) superimposed on top with a random orientation, scale, and color transformation. All images are 416x550 to simulate a smartphone aspect ratio.

    To generate your own images, follow our tutorial or download the code.

    Example: https://blog.roboflow.ai/content/images/2020/04/synthetic-fruit-examples.jpg" alt="Example Image">

  15. Dataset for publication: Usefulness of synthetic datasets for diatom...

    • dorel.univ-lorraine.fr
    • zenodo.org
    • +1more
    bin, jpeg +4
    Updated Jul 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Université de Lorraine (2023). Dataset for publication: Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach [Dataset]. http://doi.org/10.12763/UADENQ
    Explore at:
    text/x-python(652), bin(456), tsv(1716), text/x-python(1957), text/x-python(4882), text/x-python(3391), text/x-python(12356), jpeg(7239), text/x-python(8545), zip(50188610), bin(1530), text/markdown(2269)Available download formats
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    University of Lorrainehttp://www.univ-lorraine.fr/
    License

    Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
    License information was derived automatically

    Description

    This repository contains the dataset and code used to generate synthetic dataset as explained in the paper "Usefulness of synthetic datasets for diatom automatic detection using a deep-learning approach". Dataset : The dataset consists of two components: individual diatom images extracted from publicly available diatom atlases [1,2,3] and individual debris images. - Individual diatom images : currently, the repository consists of 166 diatom species, totalling 9230 images. These images were automatically extracted from atlases using PDF scraping, cleaned and verified by diatom taxonomists. The subfolders within each diatom specie indicates the origin of the images: RA[1], IDF[2], BRG[3]. Additional diatom species and images will be regularly updated in the repository. - Individual debris images : the debris images were extracted from real microscopy images. The repository contains 600 debris objects. Code : Contains the code used to generate synthetic microscopy images. For details on how to use the code, kindly refer to the README file available in synthetic_data_generator/.

  16. MatSim Dataset and benchmark for one-shot visual materials and textures...

    • zenodo.org
    • data.niaid.nih.gov
    pdf, zip
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik; Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik (2024). MatSim Dataset and benchmark for one-shot visual materials and textures recognition [Dataset]. http://doi.org/10.5281/zenodo.7390166
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik; Manuel S. Drehwald; Sagi Eppel; Jolina Li; Han Hao; Alan Aspuru-Guzik
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The MatSim Dataset and benchmark

    Synthetic dataset and real images benchmark for visual similarity recognition of materials and textures.

    MatSim: a synthetic dataset, a benchmark, and a method for computer vision-based recognition of similarities and transitions between materials and textures focusing on identifying any material under any conditions using one or a few examples (one-shot learning).

    Based on the paper: One-shot recognition of any material anywhere using contrastive learning with physics-based rendering

    Benchmark_MATSIM.zip: contain the benchmark made of real-world images as described in the paper



    MatSim_object_train_split_1,2,3.zip: Contain a subset of the synthetics dataset for images of CGI images materials on random objects as described in the paper.

    MatSim_Vessels_Train_1,2,3.zip : Contain a subset of the synthetics dataset for images of CGI images materials inside transparent containers as described in the paper.

    *Note: these are subsets of the dataset; the full dataset can be found at:
    https://e1.pcloud.link/publink/show?code=kZIiSQZCYU5M4HOvnQykql9jxF4h0KiC5MX

    or
    https://icedrive.net/s/A13FWzZ8V2aP9T4ufGQ1N3fBZxDF

    Code:

    Up to date code for generating the dataset, reading and evaluation and trained nets can be found in this URL:https://github.com/sagieppel/MatSim-Dataset-Generator-Scripts-And-Neural-net

    Dataset Generation Scripts.zip: Contain the Blender (3.1) Python scripts used for generating the dataset, this code might be odl up to date code can be found here
    Net_Code_And_Trained_Model.zip: Contain a reference neural net code, including loaders, trained models, and evaluators scripts that can be used to read and train with the synthetic dataset or test the model with the benchmark. Note code in the ZIP file is not up to date and contains some bugs For the Latest version of this code see this URL

    Further documentation can be found inside the zip files or in the paper.

  17. u

    SynthCity Dataset - Trajectory

    • rdr.ucl.ac.uk
    txt
    Updated Sep 11, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Griffiths; Jan Boehm (2019). SynthCity Dataset - Trajectory [Dataset]. http://doi.org/10.5522/04/8851790.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 11, 2019
    Dataset provided by
    University College London
    Authors
    David Griffiths; Jan Boehm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With deep learning becoming a more prominent approach for automatic classification of three-dimensional point cloud data, a key bottleneck is the amount of high quality training data, especially when compared to that available for two-dimensional images. One potential solution is the use of synthetic data for pre-training networks, however the ability for models to generalise from synthetic data to real world data has been poorly studied for point clouds. Despite this, a huge wealth of 3D virtual environments exist, which if proved effective can be exploited. We therefore argue that research in this domain would be hugely useful. In this paper we present SynthCity an open dataset to help aid research. SynthCity is a 367.9M point synthetic full colour Mobile Laser Scanning point cloud. Every point is labelled from one of nine categories. We generate our point cloud in a typical Urban/Suburban environment using the Blensor plugin for Blender. See our project website http://www.synthcity.xyz or paper https://arxiv.org/abs/1907.04758 for more information.

  18. (capsicum) deepNIR: Dataset for generating synthetic NIR images

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inkyu Sa; Inkyu Sa; Jong Yoon Lim; Ho Seok Ahn; Jong Yoon Lim; Ho Seok Ahn (2022). (capsicum) deepNIR: Dataset for generating synthetic NIR images [Dataset]. http://doi.org/10.5281/zenodo.6340415
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Inkyu Sa; Inkyu Sa; Jong Yoon Lim; Ho Seok Ahn; Jong Yoon Lim; Ho Seok Ahn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains capsicum NIR+RGB dataset used in our paper; deepNIR: Dataset for generating synthetic NIR images and improved fruit detection system using deep learning techniques.

    Please refer to http://tiny.one/deepNIR for more detail.

  19. Tango Spacecraft Dataset for Region of Interest Estimation and Semantic...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bechini Michele; Bechini Michele; Lunghi Paolo; Lunghi Paolo; Lavagna Michèle; Lavagna Michèle (2023). Tango Spacecraft Dataset for Region of Interest Estimation and Semantic Segmentation [Dataset]. http://doi.org/10.5281/zenodo.6507864
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bechini Michele; Bechini Michele; Lunghi Paolo; Lunghi Paolo; Lavagna Michèle; Lavagna Michèle
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Reference Paper:

    M. Bechini, M. Lavagna, P. Lunghi, Dataset generation and validation for spacecraft pose estimation via monocular images processing, Acta Astronautica 204 (2023) 358–369

    M. Bechini, P. Lunghi, M. Lavagna. "Spacecraft Pose Estimation via Monocular Image Processing: Dataset Generation and Validation". In 9th European Conference for Aeronautics and Aerospace Sciences (EUCASS)

    General Description:

    The "Tango Spacecraft Dataset for Region of Interest Estimation and Semantic Segmentation" dataset here published should be used for Region of Interest (ROI) and/or semantic segmentation tasks. It is split into 30002 train images and 3002 test images representing the Tango spacecraft from Prisma mission, being the largest publicly available dataset of synthetic space-borne noise-free images tailored to ROI extraction and Semantic Segmentation tasks (up to our knowledge). The label of each image gives, for the Bounding Box annotations, the filename of the image, the ROI top-left corner (minimum x, minimum y) in pixels, the ROI bottom-right corner (maximum x, maximum y) in pixels, and the center point of the ROI in pixels. The annotation are taken in image reference frame with the origin located at the top-left corner of the image, positive x rightward and positive y downward. Concerning the Semantic Segmentation, RGB masks are provided. Each RGB mask correspond to a single image in both train and test dataset. The RGB images are such that the R channel corresponds to the spacecraft, the G channel corresponds to the Earth (if present), and the B channel corresponds to the background (deep space). Per each channel the pixels have non-zero value only in correspondence of the object that they represent (Tango, Earth, Deep Space). More information on the dataset split and on the label format are reported below.

    Images Information:

    The dataset comprises 30002 synthetic grayscale images of Tango spacecraft from Prisma mission that serves as train set, while the test set is formed by 3002 synthetic grayscale images of Tango spacecraft from Prisma mission in PNG format. About 1/6 of the images both in the train and in the test set have a non-black background, obtained by rendering an Earth-like model in the raytracing process used to define the images reported. The images are noise-free to increase the flexibility of the dataset. The illumination direction of the spacecraft in the scene is uniformly distributed in the 3D space in agreement with the Sun position constraints.


    Labels Information:

    Labels for the bounding box extraction are here provided in separated JSON files. The files are formatted per each image as in the following example:

    • filename : tango_img_1 # name of the image to which the data are referred
    • rol_tl : [x, y] # ROI top-left corner (minimum x, minimum y) in pixels
    • roi_br : [x, y] # ROI bottom-right corner (maximum x, maximum y) in pixels
    • roi_cc : [x, y] # center point of the ROI in pixels

    Notice that the annotation are taken in image reference frame with the origin located at the top-left corner of the image, positive x rightward and positive y downward.To make the usage of the dataset easier, both the training set and the test set are split in two folders containing the images with earth as background and without background.

    Concerning the Semantic Segmentation Labels, they are provided as RGB masks named as "filename_mask.png" where "filename" is the filename of the image of the training set or the test set to which a specific mask is referred. The RGB images are such that the R channel corresponds to the spacecraft, the G channel corresponds to the Earth (if present), and the B channel corresponds to the background (deep space). Per each channel the pixels have non-zero value only in correspondence of the object that they represent (Tango, Earth, Deep Space).

    VERSION CONTROL

    • v1.0: This version contains the dataset (both train and test) of full scale images with ROI annotations and RGB masks for Semantic Segmentation tasks. These images have width=height=1024 pixels. The position of tango with respect to the camera is randomly selected from a uniform distribution, but it is ensured the full visibility in all the images.

    Note: this dataset contains the same images of the "Tango Spacecraft Wireframe Dataset Model for Line Segments Detection" v2.0 full-scale (DOI: https://doi.org/10.5281/zenodo.6372848) and also "Tango Spacecraft Dataset for Monocular Pose Estimation" v1.0 (DOI: https://doi.org/10.5281/zenodo.6499007) and they can be used together by combining the annotations of the relative pose and the ones of the reprojected wireframe model of Tango, with also the ones of the ROI. These three datasets give the most comprehensive dataset of space borne synthetic images ever published (up to our knowledge).

  20. replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla (2023). replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks [Dataset]. http://doi.org/10.5281/zenodo.7849417
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.

    Abstract:

    Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.

    Benchmark data

    Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).

    The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.

    Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.

    Synthetic data generation

    Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.

    A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.

    Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).


    Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.

    Funding

    This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jigyasa Katrolia; Jason Raphael Rambach; Bruno Mirbach (2021). TiCaM: Synthetic Images Dataset [Dataset]. https://datasetninja.com/ticam-synthetic-images

TiCaM: Synthetic Images Dataset

Explore at:
Dataset updated
May 23, 2021
Dataset provided by
Dataset Ninja
Authors
Jigyasa Katrolia; Jason Raphael Rambach; Bruno Mirbach
License

https://spdx.org/licenses/https://spdx.org/licenses/

Description

TiCaM Synthectic Images: A Time-of-Flight In-Car Cabin Monitoring Dataset is a time-of-flight dataset of car in-cabin images providing means to test extensive car cabin monitoring systems based on deep learning methods. The authors provide a synthetic image dataset of car cabin images similar to the real dataset leveraging advanced simulation software’s capability to generate abundant data with little effort. This can be used to test domain adaptation between synthetic and real data for select classes. For both datasets the authors provide ground truth annotations for 2D and 3D object detection, as well as for instance segmentation.

Search
Clear search
Close search
Google apps
Main menu