52 datasets found
  1. Synthea synthetic patient generator data in OMOP Common Data Model

    • registry.opendata.aws
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
    Explore at:
    Dataset updated
    Jan 4, 2023
    Dataset provided by
    Amazon.comhttp://amazon.com/
    Description

    The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079

  2. d

    Synthea lung cancer synthetic patient data series for ML

    • search.dataone.org
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, AJ (2023). Synthea lung cancer synthetic patient data series for ML [Dataset]. http://doi.org/10.7910/DVN/Q5LK5A
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chen, AJ
    Description

    These synthetic patient datasets were created for machine learning (ML) study of lung cancer risk prediction in simulation of ML-enabled learning health systems. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without lung cancer were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).

  3. d

    Synthea stroke synthetic patient data series for risk prediction ML

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, AJ (2023). Synthea stroke synthetic patient data series for risk prediction ML [Dataset]. http://doi.org/10.7910/DVN/LBD9GU
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Chen, AJ
    Description

    These synthetic patient datasets were created for machine learning (ML) study of stroke risk prediction. Five populations of 30K patients were generated by the Synthea patient generator. They were combined sequentially to form 5 different size populations, from 30K to 150K patients. Patients with or without stroke were selected roughly at 1:3 ratio and their electronic health records (EHR) were processed to data table files ready for machine learning. The ML-ready table files also have the continuous numeric values converted to categorical values. Because Synthea patients are closely resemble to real patients, these ML-ready dataset can be used to develop and test ML algorithms, and train researchers. Unlike real patient data, these Synthea datasets can be shared with collaborators anywhere without privacy concerns. The first use of these datasets was in a LHS simulation study, which was published in Nature Scientific Reports (see https://www.nature.com/articles/s41598-022-23011-4).

  4. i

    Dataset of article: Synthetic Datasets Generator for Testing Information...

    • ieee-dataport.org
    Updated Mar 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Santos (2020). Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools [Dataset]. https://ieee-dataport.org/open-access/dataset-article-synthetic-datasets-generator-testing-information-visualization-and
    Explore at:
    Dataset updated
    Mar 13, 2020
    Authors
    Carlos Santos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

  5. Synthetic EHRs for Benchmarking System Performance

    • kaggle.com
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Abbasi (2025). Synthetic EHRs for Benchmarking System Performance [Dataset]. http://doi.org/10.34740/kaggle/dsv/11614066
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ahmed Abbasi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains FHIR-compatible Electronic Health Records (EHR) generated using the Synthea synthetic patient generator. It is specifically designed to benchmark the performance of a blockchain-based EHR solution, NexaEHR, which utilizes smart contracts and IPFS for data storage and management.

    The dataset includes 1000 EHR records (files), each representing a separate synthetic record for a persona, with varying sizes. The largest record is approximately 80 MB, simulating the average record size a patient might accumulate annually. These records are intended for testing and evaluating the scalability, efficiency, and effectiveness of blockchain technology in managing and securing healthcare data within a decentralized system.

  6. Synthetic Data Video Generator Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Synthetic Data Video Generator Market Research Report 2033 [Dataset]. https://dataintelo.com/report/synthetic-data-video-generator-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Synthetic Data Video Generator Market Outlook



    According to our latest research, the global synthetic data video generator market size reached USD 1.32 billion in 2024 and is anticipated to grow at a robust CAGR of 38.7% from 2025 to 2033. By the end of 2033, the market is projected to reach USD 18.59 billion, driven by rapid advancements in artificial intelligence, the growing need for high-quality training data for machine learning models, and increasing adoption across industries such as autonomous vehicles, healthcare, and surveillance. The surge in demand for data privacy, coupled with the necessity to overcome data scarcity and bias in real-world datasets, is significantly fueling the synthetic data video generator market's growth trajectory.




    One of the primary growth factors for the synthetic data video generator market is the escalating demand for high-fidelity, annotated video datasets required to train and validate AI-driven systems. Traditional data collection methods are often hampered by privacy concerns, high costs, and the sheer complexity of obtaining diverse and representative video samples. Synthetic data video generators address these challenges by enabling the creation of large-scale, customizable, and bias-free datasets that closely mimic real-world scenarios. This capability is particularly vital for sectors such as autonomous vehicles and robotics, where the accuracy and safety of AI models depend heavily on the quality and variety of training data. As organizations strive to accelerate innovation and reduce the risks associated with real-world data collection, the adoption of synthetic data video generation technologies is expected to expand rapidly.




    Another significant driver for the synthetic data video generator market is the increasing regulatory scrutiny surrounding data privacy and compliance. With stricter regulations such as GDPR and CCPA coming into force, organizations face mounting challenges in using real-world video data that may contain personally identifiable information. Synthetic data offers an effective solution by generating video datasets devoid of any real individuals, thereby ensuring compliance while still enabling advanced analytics and machine learning. Moreover, synthetic data video generators empower businesses to simulate rare or hazardous events that are difficult or unethical to capture in real life, further enhancing model robustness and preparedness. This advantage is particularly pronounced in healthcare, surveillance, and automotive industries, where data privacy and safety are paramount.




    Technological advancements and increasing integration with cloud-based platforms are also propelling the synthetic data video generator market forward. The proliferation of cloud computing has made it easier for organizations of all sizes to access scalable synthetic data generation tools without significant upfront investments in hardware or infrastructure. Furthermore, the continuous evolution of generative adversarial networks (GANs) and other deep learning techniques has dramatically improved the realism and utility of synthetic video data. As a result, companies are now able to generate highly realistic, scenario-specific video datasets at scale, reducing both the time and cost required for AI development. This democratization of synthetic data technology is expected to unlock new opportunities across a wide array of applications, from entertainment content production to advanced surveillance systems.




    From a regional perspective, North America currently dominates the synthetic data video generator market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading AI technology providers, robust investment in research and development, and early adoption by automotive and healthcare sectors are key contributors to North America's market leadership. Europe is also witnessing significant growth, driven by stringent data privacy regulations and increased focus on AI-driven innovation. Meanwhile, Asia Pacific is emerging as a high-growth region, fueled by rapid digital transformation, expanding IT infrastructure, and increasing investments in autonomous systems and smart city projects. Latin America and Middle East & Africa, while still nascent, are expected to experience steady uptake as awareness and technological capabilities continue to grow.



    Component Analysis



    The synthetic data video generator market by comp

  7. Synthetic Data Video Generator Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Synthetic Data Video Generator Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/synthetic-data-video-generator-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Synthetic Data Video Generator Market Outlook



    According to our latest research, the global Synthetic Data Video Generator market size in 2024 stands at USD 1.46 billion, with robust momentum driven by advances in artificial intelligence and the increasing need for high-quality, privacy-compliant video datasets. The market is witnessing a remarkable compound annual growth rate (CAGR) of 37.2% from 2025 to 2033, propelled by growing adoption across sectors such as autonomous vehicles, healthcare, and surveillance. By 2033, the market is projected to reach USD 18.16 billion, reflecting a seismic shift in how organizations leverage synthetic data to accelerate innovation and mitigate data privacy concerns.



    The primary growth factor for the Synthetic Data Video Generator market is the surging demand for data privacy and compliance in machine learning and computer vision applications. As regulatory frameworks like GDPR and CCPA become more stringent, organizations are increasingly wary of using real-world video data that may contain personally identifiable information. Synthetic data video generators provide a scalable and ethical alternative, enabling enterprises to train and validate AI models without risking privacy breaches. This trend is particularly pronounced in sectors such as healthcare and finance, where data sensitivity is paramount. The ability to generate diverse, customizable, and annotation-rich video datasets not only addresses compliance requirements but also accelerates the development and deployment of AI solutions.



    Another significant driver is the rapid evolution of deep learning algorithms and simulation technologies, which have dramatically improved the realism and utility of synthetic video data. Innovations in generative adversarial networks (GANs), 3D rendering engines, and advanced simulation platforms have made it possible to create synthetic videos that closely mimic real-world environments and scenarios. This capability is invaluable for industries like autonomous vehicles and robotics, where extensive and varied training data is essential for safe and reliable system behavior. The reduction in time, cost, and logistical complexity associated with collecting and labeling real-world video data further enhances the attractiveness of synthetic data video generators, positioning them as a cornerstone technology for next-generation AI development.



    The expanding use cases for synthetic video data across emerging applications also contribute to market growth. Beyond traditional domains such as surveillance and entertainment, synthetic data video generators are finding adoption in areas like augmented reality, smart retail, and advanced robotics. The flexibility to simulate rare, dangerous, or hard-to-capture scenarios offers a strategic advantage for organizations seeking to future-proof their AI initiatives. As synthetic data generation platforms become more accessible and user-friendly, small and medium enterprises are also entering the fray, democratizing access to high-quality training data and fueling a new wave of AI-driven innovation.



    From a regional perspective, North America continues to dominate the Synthetic Data Video Generator market, benefiting from a concentration of technology giants, research institutions, and early adopters across key verticals. Europe follows closely, driven by strong regulatory emphasis on data protection and an active ecosystem of AI startups. Meanwhile, the Asia Pacific region is emerging as a high-growth market, buoyed by rapid digital transformation, government AI initiatives, and increasing investments in autonomous systems and smart cities. Latin America and the Middle East & Africa are also showing steady progress, albeit from a smaller base, as awareness and infrastructure for synthetic data generation mature.





    Component Analysis



    The Synthetic Data Video Generator market, when analyzed by component, is primarily segmented into Software and Services. The software segment currently commands the largest share, driven by the prolif

  8. Quantum-AI Synthetic Data Generator Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Quantum-AI Synthetic Data Generator Market Research Report 2033 [Dataset]. https://dataintelo.com/report/quantum-ai-synthetic-data-generator-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Quantum-AI Synthetic Data Generator Market Outlook



    According to our latest research, the global Quantum-AI Synthetic Data Generator market size reached USD 1.82 billion in 2024, reflecting a robust expansion driven by technological advancements and increasing adoption across multiple industries. The market is projected to grow at a CAGR of 32.7% from 2025 to 2033, reaching a forecasted market size of USD 21.69 billion by 2033. This growth trajectory is primarily fueled by the rising demand for high-quality synthetic data to train artificial intelligence models, address data privacy concerns, and accelerate digital transformation initiatives across sectors such as healthcare, finance, and retail.




    One of the most significant growth factors for the Quantum-AI Synthetic Data Generator market is the escalating need for vast, diverse, and privacy-compliant datasets to train advanced AI and machine learning models. As organizations increasingly recognize the limitations and risks associated with using real-world data, particularly regarding data privacy regulations like GDPR and CCPA, the adoption of synthetic data generation technologies has surged. Quantum computing, when integrated with artificial intelligence, enables the rapid and efficient creation of highly realistic synthetic datasets that closely mimic real-world data distributions while ensuring complete anonymity. This capability is proving invaluable for sectors like healthcare and finance, where data sensitivity is paramount and regulatory compliance is non-negotiable. As a result, organizations are investing heavily in Quantum-AI synthetic data solutions to enhance model accuracy, reduce bias, and streamline data sharing without compromising privacy.




    Another key driver propelling the market is the growing complexity and volume of data generated by emerging technologies such as IoT, autonomous vehicles, and smart devices. Traditional data collection methods are often insufficient to keep pace with the data requirements of modern AI applications, leading to gaps in data availability and quality. Quantum-AI Synthetic Data Generators address these challenges by producing large-scale, high-fidelity synthetic datasets on demand, enabling organizations to simulate rare events, test edge cases, and improve model robustness. Additionally, the capability to generate structured, semi-structured, and unstructured data allows businesses to meet the specific needs of diverse applications, ranging from fraud detection in banking to predictive maintenance in manufacturing. This versatility is further accelerating market adoption, as enterprises seek to future-proof their AI initiatives and gain a competitive edge.




    The integration of Quantum-AI Synthetic Data Generators into cloud-based platforms and enterprise IT ecosystems is also catalyzing market growth. Cloud deployment models offer scalability, flexibility, and cost-effectiveness, making synthetic data generation accessible to organizations of all sizes, including small and medium enterprises. Furthermore, the proliferation of AI-driven analytics in sectors such as retail, e-commerce, and telecommunications is creating new opportunities for synthetic data applications, from enhancing customer experience to optimizing supply chain operations. As vendors continue to innovate and expand their service offerings, the market is expected to witness sustained growth, with new entrants and established players alike vying for market share through strategic partnerships, product launches, and investments in R&D.




    From a regional perspective, North America currently dominates the Quantum-AI Synthetic Data Generator market, accounting for over 38% of the global revenue in 2024, followed by Europe and Asia Pacific. The strong presence of leading technology companies, robust investment in AI research, and favorable regulatory environment contribute to North America's leadership position. Europe is also witnessing significant growth, driven by stringent data privacy regulations and increasing adoption of AI across industries. Meanwhile, the Asia Pacific region is emerging as a high-growth market, fueled by rapid digitalization, expanding IT infrastructure, and government initiatives promoting AI innovation. As regional markets continue to evolve, strategic collaborations and cross-border partnerships are expected to play a pivotal role in shaping the global landscape of the Quantum-AI Synthetic Data Generator market.



    Component Analysis


    &l

  9. Quantum-AI Synthetic Data Generator Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Quantum-AI Synthetic Data Generator Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/quantum-ai-synthetic-data-generator-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Quantum-AI Synthetic Data Generator Market Outlook




    According to our latest research, the global Quantum-AI Synthetic Data Generator market size reached USD 1.98 billion in 2024, reflecting robust momentum driven by the convergence of quantum computing and artificial intelligence technologies in data generation. The market is experiencing a significant compound annual growth rate (CAGR) of 32.1% from 2025 to 2033. At this pace, the market is forecasted to reach USD 24.8 billion by 2033. This remarkable growth is propelled by the escalating demand for high-quality synthetic data across industries to enhance AI model training, ensure data privacy, and overcome data scarcity challenges.




    One of the primary growth drivers for the Quantum-AI Synthetic Data Generator market is the increasing reliance on advanced machine learning and deep learning models that require vast amounts of diverse, high-fidelity data. Traditional data sources often fall short in volume, variety, and compliance with privacy regulations. Quantum-AI synthetic data generators address these challenges by producing realistic, representative datasets that mimic real-world scenarios without exposing sensitive information. This capability is particularly crucial in regulated sectors such as healthcare and finance, where data privacy and security are paramount. As organizations seek to accelerate AI adoption while minimizing ethical and legal risks, the demand for sophisticated synthetic data solutions continues to rise.




    Another significant factor fueling market expansion is the rapid evolution of quantum computing and its integration with AI algorithms. Quantum computing’s superior processing power enables the generation of complex, large-scale datasets at unprecedented speeds and accuracy. This synergy allows enterprises to simulate intricate data patterns and rare events that would be difficult or impossible to capture through conventional means. Additionally, the proliferation of AI-driven applications in sectors like autonomous vehicles, predictive maintenance, and personalized medicine is amplifying the need for synthetic data generators that can support advanced analytics and model validation. The ongoing advancements in quantum hardware, coupled with the growing ecosystem of AI tools, are expected to further catalyze innovation and adoption in this market.




    Moreover, the shift toward digital transformation and the growing adoption of cloud-based solutions are reshaping the landscape of the Quantum-AI Synthetic Data Generator market. Enterprises of all sizes are embracing synthetic data generation to streamline data workflows, reduce operational costs, and accelerate time-to-market for AI-powered products and services. Cloud deployment models offer scalability, flexibility, and seamless integration with existing data infrastructure, making synthetic data generation accessible even to resource-constrained organizations. As digital ecosystems evolve and data-driven decision-making becomes a competitive imperative, the strategic importance of synthetic data generation is set to intensify, fostering sustained market growth through 2033.




    From a regional perspective, North America currently leads the market, driven by early technology adoption, substantial investments in quantum and AI research, and a vibrant ecosystem of startups and established technology firms. Europe follows closely, benefiting from strong regulatory frameworks and robust funding for AI innovation. The Asia Pacific region is witnessing the fastest growth, fueled by expanding digital economies, government initiatives supporting AI and quantum technology, and increasing awareness of synthetic data’s strategic value. As global enterprises seek to harness the power of quantum-AI synthetic data generators to gain a competitive edge, regional dynamics will continue to shape market trajectories and opportunities.





    Component Analysis




    The Component segment of the Quantum-AI Synthetic Data Generator

  10. Synthetic river flow videos dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Bodart; Jérôme Le Coz; Magali Jodeau; Alexandre Hauet; Guillaume Bodart; Jérôme Le Coz; Magali Jodeau; Alexandre Hauet (2022). Synthetic river flow videos dataset [Dataset]. http://doi.org/10.5281/zenodo.6956063
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Aug 3, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guillaume Bodart; Jérôme Le Coz; Magali Jodeau; Alexandre Hauet; Guillaume Bodart; Jérôme Le Coz; Magali Jodeau; Alexandre Hauet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ###### ######### ########## ###### ######### ##########
    Synthetic river flow videos for evaluating image-based velocimetry methods
    ###### ######### ########## ###### ######### ##########
    # Year : 2022
    # Authors : G.Bodart (guillaume.bodart@inrae.fr), J.Le Coz (jerome.lecoz@inrae.fr), M.Jodeau (magali.jodeau@edf.fr), A.Hauet (alexandre.hauet@edf.fr)

    ###
    This file describes the data attached to the article

    ### -> 00_article_cases

    - This folder contains the data used in the case studies: synthetic videos + reference files. Please refer to the README file for more information about the data.
    - 00_reference_velocities
    -> Reference velocities interpolated on a regular grid. Data are given in conventional units, i.e. m/s and m.
    - 01_XX
    -> Data of the first case study.
    - 02_XX
    -> Data of the second case study.

    ### -> 01_dev

    - This folder contains the modified version of Mantaflow described in the article. Installation instructions can be found at http://mantaflow.com

    ### -> 02_dataset

    - This folder contains synthetic videos generated with the method described in the article. The fluid simulation parameters, and thus the reference velocities, are the same as those presented in the article.
    - The videos can be used freely. Please consider citing the corresponding paper.

  11. D

    Knowledge Graph Generator

    • darus.uni-stuttgart.de
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Timon Glaser (2025). Knowledge Graph Generator [Dataset]. http://doi.org/10.18419/DARUS-4436
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2025
    Dataset provided by
    DaRUS
    Authors
    Gabriel Timon Glaser
    License

    https://spdx.org/licenses/MIT.htmlhttps://spdx.org/licenses/MIT.html

    Description

    Code and experiment results for a synthetic knowledge graph generator. The generator receives a set of rules, with an expected body support and support, and returns a knowledge graph that approximately matches the rules according to the body support and confidence. This code was developed during the Bachelor thesis by Gabriel Glaser, Generating Random Knowledge Graphs from Rules, University of Stuttgart, 2024. doi:10.18419/opus-15467.

  12. Synthetic Data Generation Market Analysis, Size, and Forecast 2025-2029:...

    • technavio.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Synthetic Data Generation Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Italy, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/synthetic-data-generation-market-analysis
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Synthetic Data Generation Market Size 2025-2029

    The synthetic data generation market size is forecast to increase by USD 4.39 billion, at a CAGR of 61.1% between 2024 and 2029.

    The market is experiencing significant growth, driven by the escalating demand for data privacy protection. With increasing concerns over data security and the potential risks associated with using real data, synthetic data is gaining traction as a viable alternative. Furthermore, the deployment of large language models is fueling market expansion, as these models can generate vast amounts of realistic and diverse data, reducing the reliance on real-world data sources. However, high costs associated with high-end generative models pose a challenge for market participants. These models require substantial computational resources and expertise to develop and implement effectively. Companies seeking to capitalize on market opportunities must navigate these challenges by investing in research and development to create more cost-effective solutions or partnering with specialists in the field. Overall, the market presents significant potential for innovation and growth, particularly in industries where data privacy is a priority and large language models can be effectively utilized.

    What will be the Size of the Synthetic Data Generation Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market continues to evolve, driven by the increasing demand for data-driven insights across various sectors. Data processing is a crucial aspect of this market, with a focus on ensuring data integrity, privacy, and security. Data privacy-preserving techniques, such as data masking and anonymization, are essential in maintaining confidentiality while enabling data sharing. Real-time data processing and data simulation are key applications of synthetic data, enabling predictive modeling and data consistency. Data management and workflow automation are integral components of synthetic data platforms, with cloud computing and model deployment facilitating scalability and flexibility. Data governance frameworks and compliance regulations play a significant role in ensuring data quality and security. Deep learning models, variational autoencoders (VAEs), and neural networks are essential tools for model training and optimization, while API integration and batch data processing streamline the data pipeline. Machine learning models and data visualization provide valuable insights, while edge computing enables data processing at the source. Data augmentation and data transformation are essential techniques for enhancing the quality and quantity of synthetic data. Data warehousing and data analytics provide a centralized platform for managing and deriving insights from large datasets. Synthetic data generation continues to unfold, with ongoing research and development in areas such as federated learning, homomorphic encryption, statistical modeling, and software development. The market's dynamic nature reflects the evolving needs of businesses and the continuous advancements in data technology.

    How is this Synthetic Data Generation Industry segmented?

    The synthetic data generation industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. End-userHealthcare and life sciencesRetail and e-commerceTransportation and logisticsIT and telecommunicationBFSI and othersTypeAgent-based modellingDirect modellingApplicationAI and ML Model TrainingData privacySimulation and testingOthersProductTabular dataText dataImage and video dataOthersGeographyNorth AmericaUSCanadaMexicoEuropeFranceGermanyItalyUKAPACChinaIndiaJapanRest of World (ROW)

    By End-user Insights

    The healthcare and life sciences segment is estimated to witness significant growth during the forecast period.In the rapidly evolving data landscape, the market is gaining significant traction, particularly in the healthcare and life sciences sector. With a growing emphasis on data-driven decision-making and stringent data privacy regulations, synthetic data has emerged as a viable alternative to real data for various applications. This includes data processing, data preprocessing, data cleaning, data labeling, data augmentation, and predictive modeling, among others. Medical imaging data, such as MRI scans and X-rays, are essential for diagnosis and treatment planning. However, sharing real patient data for research purposes or training machine learning algorithms can pose significant privacy risks. Synthetic data generation addresses this challenge by producing realistic medical imaging data, ensuring data privacy while enabling research

  13. f

    Table1_Building an Artificial Intelligence Laboratory Based on Real World...

    • frontiersin.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Damiani; C. Masciocchi; J. Lenkowicz; N. D. Capocchiano; L. Boldrini; L. Tagliaferri; A. Cesario; P. Sergi; A. Marchetti; A. Luraschi; S. Patarnello; V. Valentini (2023). Table1_Building an Artificial Intelligence Laboratory Based on Real World Data: The Experience of Gemelli Generator.DOCX [Dataset]. http://doi.org/10.3389/fcomp.2021.768266.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    A. Damiani; C. Masciocchi; J. Lenkowicz; N. D. Capocchiano; L. Boldrini; L. Tagliaferri; A. Cesario; P. Sergi; A. Marchetti; A. Luraschi; S. Patarnello; V. Valentini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The problem of transforming Real World Data into Real World Evidence is becoming increasingly important in the frameworks of Digital Health and Personalized Medicine, especially with the availability of modern algorithms of Artificial Intelligence high computing power, and large storage facilities.Even where Real World Data are well maintained in a hospital data warehouse and are made available for research purposes, many aspects need to be addressed to build an effective architecture enabling researchers to extract knowledge from data.We describe the first year of activity at Gemelli Generator RWD, the challenges we faced and the solutions we put in place to build a Real World Data laboratory at the service of patients and health researchers. Three classes of services are available today: retrospective analysis of existing patient data for descriptive and clustering purposes; automation of knowledge extraction, ranging from text mining, patient selection for trials, to generation of new research hypotheses; and finally the creation of Decision Support Systems, with the integration of data from the hospital data warehouse, apps, and Internet of Things.

  14. SDNist v1.3: Temporal Map Challenge Environment

    • datasets.ai
    • data.nist.gov
    • +1more
    0, 23, 5, 8
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). SDNist v1.3: Temporal Map Challenge Environment [Dataset]. https://datasets.ai/datasets/sdnist-benchmark-data-and-evaluation-tools-for-data-synthesizers
    Explore at:
    5, 23, 8, 0Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    SDNist (v1.3) is a set of benchmark data and metrics for the evaluation of synthetic data generators on structured tabular data. This version (1.3) reproduces the challenge environment from Sprints 2 and 3 of the Temporal Map Challenge. These benchmarks are distributed as a simple open-source python package to allow standardized and reproducible comparison of synthetic generator models on real world data and use cases. These data and metrics were developed for and vetted through the NIST PSCR Differential Privacy Temporal Map Challenge, where the evaluation tools, k-marginal and Higher Order Conjunction, proved effective in distinguishing competing models in the competition environment.SDNist is available via pip install: pip install sdnist==1.2.8 for Python >=3.6 or on the USNIST/Github. The sdnist Python module will download data from NIST as necessary, and users are not required to download data manually.

  15. Z

    Synthetic Hand Recognition Dataset

    • data.niaid.nih.gov
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weigand, Felix (2023). Synthetic Hand Recognition Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7415617
    Explore at:
    Dataset updated
    Jan 24, 2023
    Dataset provided by
    Dadgar, Amin
    Brunnett, Guido
    Uhlmann,Tom
    Weigand, Felix
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the synthetic hand recognition dataset generated with the Synthetic Human Dataset Generator (available on github). It contains 90,000 images of 10 different virtual humans (5 female, 5 male) in various poses. All images are annotated with the color coded segmented hands.

  16. Data from: A large synthetic dataset for machine learning applications in...

    • zenodo.org
    csv, json, png, zip
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Gillioz; Marc Gillioz; Guillaume Dubuis; Philippe Jacquod; Philippe Jacquod; Guillaume Dubuis (2025). A large synthetic dataset for machine learning applications in power transmission grids [Dataset]. http://doi.org/10.5281/zenodo.13378476
    Explore at:
    zip, png, csv, jsonAvailable download formats
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marc Gillioz; Marc Gillioz; Guillaume Dubuis; Philippe Jacquod; Philippe Jacquod; Guillaume Dubuis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the ongoing energy transition, power grids are evolving fast. They operate more and more often close to their technical limit, under more and more volatile conditions. Fast, essentially real-time computational approaches to evaluate their operational safety, stability and reliability are therefore highly desirable. Machine Learning methods have been advocated to solve this challenge, however they are heavy consumers of training and testing data, while historical operational data for real-world power grids are hard if not impossible to access.

    This dataset contains long time series for production, consumption, and line flows, amounting to 20 years of data with a time resolution of one hour, for several thousands of loads and several hundreds of generators of various types representing the ultra-high-voltage transmission grid of continental Europe. The synthetic time series have been statistically validated agains real-world data.

    Data generation algorithm

    The algorithm is described in a Nature Scientific Data paper. It relies on the PanTaGruEl model of the European transmission network -- the admittance of its lines as well as the location, type and capacity of its power generators -- and aggregated data gathered from the ENTSO-E transparency platform, such as power consumption aggregated at the national level.

    Network

    The network information is encoded in the file europe_network.json. It is given in PowerModels format, which it itself derived from MatPower and compatible with PandaPower. The network features 7822 power lines and 553 transformers connecting 4097 buses, to which are attached 815 generators of various types.

    Time series

    The time series forming the core of this dataset are given in CSV format. Each CSV file is a table with 8736 rows, one for each hourly time step of a 364-day year. All years are truncated to exactly 52 weeks of 7 days, and start on a Monday (the load profiles are typically different during weekdays and weekends). The number of columns depends on the type of table: there are 4097 columns in load files, 815 for generators, and 8375 for lines (including transformers). Each column is described by a header corresponding to the element identifier in the network file. All values are given in per-unit, both in the model file and in the tables, i.e. they are multiples of a base unit taken to be 100 MW.

    There are 20 tables of each type, labeled with a reference year (2016 to 2020) and an index (1 to 4), zipped into archive files arranged by year. This amount to a total of 20 years of synthetic data. When using loads, generators, and lines profiles together, it is important to use the same label: for instance, the files loads_2020_1.csv, gens_2020_1.csv, and lines_2020_1.csv represent a same year of the dataset, whereas gens_2020_2.csv is unrelated (it actually shares some features, such as nuclear profiles, but it is based on a dispatch with distinct loads).

    Usage

    The time series can be used without a reference to the network file, simply using all or a selection of columns of the CSV files, depending on the needs. We show below how to select series from a particular country, or how to aggregate hourly time steps into days or weeks. These examples use Python and the data analyis library pandas, but other frameworks can be used as well (Matlab, Julia). Since all the yearly time series are periodic, it is always possible to define a coherent time window modulo the length of the series.

    Selecting a particular country

    This example illustrates how to select generation data for Switzerland in Python. This can be done without parsing the network file, but using instead gens_by_country.csv, which contains a list of all generators for any country in the network. We start by importing the pandas library, and read the column of the file corresponding to Switzerland (country code CH):

    import pandas as pd
    CH_gens = pd.read_csv('gens_by_country.csv', usecols=['CH'], dtype=str)

    The object created in this way is Dataframe with some null values (not all countries have the same number of generators). It can be turned into a list with:

    CH_gens_list = CH_gens.dropna().squeeze().to_list()

    Finally, we can import all the time series of Swiss generators from a given data table with

    pd.read_csv('gens_2016_1.csv', usecols=CH_gens_list)

    The same procedure can be applied to loads using the list contained in the file loads_by_country.csv.

    Averaging over time

    This second example shows how to change the time resolution of the series. Suppose that we are interested in all the loads from a given table, which are given by default with a one-hour resolution:

    hourly_loads = pd.read_csv('loads_2018_3.csv')

    To get a daily average of the loads, we can use:

    daily_loads = hourly_loads.groupby([t // 24 for t in range(24 * 364)]).mean()

    This results in series of length 364. To average further over entire weeks and get series of length 52, we use:

    weekly_loads = hourly_loads.groupby([t // (24 * 7) for t in range(24 * 364)]).mean()

    Source code

    The code used to generate the dataset is freely available at https://github.com/GeeeHesso/PowerData. It consists in two packages and several documentation notebooks. The first package, written in Python, provides functions to handle the data and to generate synthetic series based on historical data. The second package, written in Julia, is used to perform the optimal power flow. The documentation in the form of Jupyter notebooks contains numerous examples on how to use both packages. The entire workflow used to create this dataset is also provided, starting from raw ENTSO-E data files and ending with the synthetic dataset given in the repository.

    Funding

    This work was supported by the Cyber-Defence Campus of armasuisse and by an internal research grant of the Engineering and Architecture domain of HES-SO.

  17. u

    Data from: Synthetic realistic noise-corrupted PPG database and noise...

    • produccioncientifica.ucm.es
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masinelli, Giulio; Dell'Agnola, Fabio; Valdés, Adriana; Atienza, David; Masinelli, Giulio; Dell'Agnola, Fabio; Valdés, Adriana; Atienza, David (2021). Synthetic realistic noise-corrupted PPG database and noise generator for the evaluation of PPG denoising and delineation algorithms [Dataset]. https://produccioncientifica.ucm.es/documentos/668fc441b9e7c03b01bd7ece
    Explore at:
    Dataset updated
    2021
    Authors
    Masinelli, Giulio; Dell'Agnola, Fabio; Valdés, Adriana; Atienza, David; Masinelli, Giulio; Dell'Agnola, Fabio; Valdés, Adriana; Atienza, David
    Description

    Overview This database is meant to evaluate the performance of denoising and delineation algorithms for PPG signals affected by noise. The noise generator allows applying the algorithms under test to an artificially corrupted reference PPG signal and comparing its output to the output obtained with the original signal. Moreover, the noise generator can produce artifacts of variable intensities, permitting the evaluation of the algorithms' performance against different noise levels. The reference signal is a PPG sample of a healthy subject at rest during a relaxing session. Database The database includes 1 recording of 72 seconds of synchronous PPG and ECG signals sampled at 250 Hz using a Medicom device, ABP-10 module (Medicom MTD Ltd., Russia). It was collected from a healthy subject during an induced relaxation by guided autogenic relaxation. For more information about the data collection, please refer to the following publication: https://pubmed.ncbi.nlm.nih.gov/30094756/ In addition, PPG signals corrupted by the noise generator at different levels are also included in the database. Realistic noise generator Motion Artifacts in PPG signals generally appear in the form of sudden spikes (in correspondence to the subject's movement) and slowly varying offsets (baseline wander) due to the changes in distance between the skin and the sensor after every sudden movement. For this reason, conventional noise generators — using random noise drawn from different distributions such as Gaussian or Poissonian — do not allow to properly evaluate the algorithm's performance, as they can only provide unrealistic noises compared to the one commonly found in PPG signals. To overcome this issue, we designed a more realistic synthetic noise generator that can simulate those two behaviors, enabling us to corrupt a reference signal with different noise levels. The details about noise generation are available in the reference paper. Data Files The reference PPG signal can be found in Datasets\GoodSignals\PPG and the simultaneously acquired ECG in Datasets\GoodSignals\ECG. The folder Datasets\NoisySignals contains 340 noisy PPG signals affected by different levels of noise. The names describe the intensity of the noise (evaluated in terms of the standard deviation of the random noise used as input for the noise generator, see reference paper). Five noisy signals are produced for every noise level by running the noise generator with five random seeds each (for noise generation). Name convention: ppg_stdx_y denotes the y-th noisy PPG signal produced using a noise with a standard deviation of x. Datasets\BPMs contains the ground truth for the heart-rate estimation computed in windows of 8s with an overlap of 2s. Code The folder Code contains the MATLAB scripts to generate the noisy files by generating the realistic noise with the function noiseGenerator. When referencing this material, please cite: Masinelli, G.; Dell'Agnola, F.; Valdés, A.A.; Atienza, D. SPARE: A Spectral Peak Recovery Algorithm for PPG Signals Pulsewave Reconstruction in Multimodal Wearable Devices. Sensors 2021, 21, 2725. https://doi.org/10.3390/s21082725

  18. Artificial Intelligence (AI) Text Generator Market Analysis North America,...

    • technavio.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Artificial Intelligence (AI) Text Generator Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, UK, China, India, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ai-text-generator-market-analysis
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United States, Global
    Description

    Snapshot img

    Artificial Intelligence Text Generator Market Size 2024-2028

    The artificial intelligence (AI) text generator market size is forecast to increase by USD 908.2 million at a CAGR of 21.22% between 2023 and 2028.

    The market is experiencing significant growth due to several key trends. One of these trends is the increasing popularity of AI generators in various sectors, including education for e-learning applications. Another trend is the growing importance of speech-to-text technology, which is becoming increasingly essential for improving productivity and accessibility. However, data privacy and security concerns remain a challenge for the market, as generators process and store vast amounts of sensitive information. It is crucial for market participants to address these concerns through strong data security measures and transparent data handling practices to ensure customer trust and compliance with regulations. Overall, the AI generator market is poised for continued growth as it offers significant benefits in terms of efficiency, accuracy, and accessibility.
    

    What will be the Size of the Artificial Intelligence (AI) Text Generator Market During the Forecast Period?

    Request Free Sample

    The market is experiencing significant growth as businesses and organizations seek to automate content creation across various industries. Driven by technological advancements in machine learning (ML) and natural language processing, AI generators are increasingly being adopted for downstream applications in sectors such as education, manufacturing, and e-commerce. 
    Moreover, these systems enable the creation of personalized content for global audiences in multiple languages, providing a competitive edge for businesses in an interconnected Internet economy. However, responsible AI practices are crucial to mitigate risks associated with biased content, misinformation, misuse, and potential misrepresentation.
    

    How is this Artificial Intelligence (AI) Text Generator Industry segmented and which is the largest segment?

    The artificial intelligence (AI) text generator industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Component
    
      Solution
      Service
    
    
    Application
    
      Text to text
      Speech to text
      Image/video to text
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        India
    
    
      South America
    
    
    
      Middle East and Africa
    

    By Component Insights

    The solution segment is estimated to witness significant growth during the forecast period.
    

    Artificial Intelligence (AI) text generators have gained significant traction in various industries due to their efficiency and cost-effectiveness in content creation. These solutions utilize machine learning algorithms, such as Deep Neural Networks, to analyze and learn from vast datasets of human-written text. By predicting the most probable word or sequence of words based on patterns and relationships identified In the training data, AIgenerators produce personalized content for multiple languages and global audiences. The application spans across industries, including education, manufacturing, e-commerce, and entertainment & media. In the education industry, AI generators assist in creating personalized learning materials.

    Get a glance at the Artificial Intelligence (AI) Text Generator Industry report of share of various segments Request Free Sample

    The solution segment was valued at USD 184.50 million in 2018 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 33% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions, Request Free Sample

    The North American market holds the largest share in the market, driven by the region's technological advancements and increasing adoption of AI in various industries. AI text generators are increasingly utilized for content creation, customer service, virtual assistants, and chatbots, catering to the growing demand for high-quality, personalized content in sectors such as e-commerce and digital marketing. Moreover, the presence of tech giants like Google, Microsoft, and Amazon in North America, who are investing significantly in AI and machine learning, further fuels market growth. AI generators employ Machine Learning algorithms, Deep Neural Networks, and Natural Language Processing to generate content in multiple languages for global audiences.

    Market Dynamics

    Our researchers analyzed the data with 2023 as the base year, along with the key drivers, trends, and c

  19. o

    Turkish Name & Surname Generator Data

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Turkish Name & Surname Generator Data [Dataset]. https://www.opendatabay.com/data/ai-ml/6fdd4bb8-39ba-4f66-816c-ffd11495bea7
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset offers a collection of Turkish names and surnames that are randomly generated. Its primary purpose is to provide data for scenarios where actual personal names are not required or desired, thus ensuring privacy and enabling versatile data applications. The names are derived from Wikipedia, while surnames originate from a GitHub repository, with the generation process being entirely random.

    Columns

    • Name: Represents a randomly generated Turkish first name.
    • Surname: Represents a randomly generated Turkish surname.

    Distribution

    The dataset is typically provided in a CSV file format. While the exact count of rows or records is not specified, its intended distribution covers a global region. The data is created through random generation, meaning its structure is focused on yielding distinct names and surnames rather than mirroring real-world population distributions.

    Usage

    This dataset is ideal for developing and testing applications that necessitate placeholder or synthetic human names. Key use cases include: * Synthetic data creation for building privacy-focused development and testing environments. * Training of Artificial Intelligence (AI) and Large Language Models (LLMs), where diverse but non-real personal data is beneficial. * Populating databases or forms with example Turkish names. * Supporting data science and analytics projects that benefit from anonymised or randomised datasets.

    Coverage

    The geographical scope of this dataset is global. There is no specific time range or demographic coverage noted, as the names and surnames are randomly generated for general utility rather than to reflect particular real-world populations or historical periods.

    License

    CC0

    Who Can Use It

    This dataset is particularly valuable for: * Data scientists and analysts who require synthetic data for model training or privacy-centric research. * Software developers in need of placeholder data for application testing and development. * Researchers in AI and LLM fields seeking diverse, randomly generated linguistic inputs. * Anyone who requires free, non-sensitive name data for various projects.

    Dataset Name Suggestions

    • Turkish Random Names
    • Synthetic Turkish Names
    • Generated Turkish Names
    • Turkish Name & Surname Generator Data
    • Random Turkish Personas

    Attributes

    Original Data Source: Türkçe isimler

  20. f

    Best synthesizers for each fairness metric evaluated in the experiments:...

    • plos.figshare.com
    xls
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mayana Pereira; Meghana Kshirsagar; Sumit Mukherjee; Rahul Dodhia; Juan Lavista Ferres; Rafael de Sousa (2024). Best synthesizers for each fairness metric evaluated in the experiments: Subgroup accuracy, difference in statistical parity and difference in equality of odds. [Dataset]. http://doi.org/10.1371/journal.pone.0297271.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Mayana Pereira; Meghana Kshirsagar; Sumit Mukherjee; Rahul Dodhia; Juan Lavista Ferres; Rafael de Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We also present the synthesizers that best preserve PPV and TPR accross subgroups. We present the two best synthetic data generator for each task. We selected best synthesizer and runner up based on experiments with privacy-loss budget ϵ = 5.0.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
Organization logo

Synthea synthetic patient generator data in OMOP Common Data Model

Explore at:
Dataset updated
Jan 4, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description

The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079

Search
Clear search
Close search
Google apps
Main menu