100+ datasets found

Synthetic Healthcare Database for Research (SyH-DR)
catalog.data.gov
healthdata.gov
+1more
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.
Australian synthetic healthcare data with Synthea
data.csiro.au
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian synthetic healthcare data with Synthea [Dataset]. https://data.csiro.au/collection/csiro:61499
Explore at:
Unique identifier
https://doi.org/10.25919/efcw-bm49
Dataset updated
Jul 4, 2024
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
Ibrahima Diouf; Mitchell O'Brien; Hamed Hassanzadeh; Donna Truran; Hoa Ngo; Parnesh Raniga; Denis Bauer; David Hansen; Sankalp Khanna; Roc Reguant Comellas; Michael Lawley; John Grimes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Australia
Dataset funded by
CSIROhttp://www.csiro.au/
Description
We developed an Australianised version of Synthea. Synthea is a synthetic data generation software that uses publicly available population aggregate statistics such as demographics, disease prevalence and incidence rates, and health reports. Synthea generates data based on manually curated models of clinical workflows and disease progression that cover a patient’s entire life and does not use real patient data; guaranteeing a completely synthetic dataset. We generated 117,258 synthetic patients from Queensland.
d
Syntegra Synthetic EHR Data | Structured Healthcare Electronic Health Record...
datarade.ai
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syntegra (2022). Syntegra Synthetic EHR Data | Structured Healthcare Electronic Health Record Data [Dataset]. https://datarade.ai/data-products/syntegra-synthetic-ehr-data-structured-healthcare-electroni-syntegra
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Syntegra
Area covered
United States of America
Description
Organizations can license synthetic, structured data generated by Syntegra from electronic health record systems of community hospitals across the United States, reaching beyond just claims and Rx data.

The synthetic data provides a detailed picture of the patient's journey throughout their hospital stay, including patient demographic information and payer type, as well as rich data not found in any other sources. Examples of this data include: drugs given (timing and dosing), patient location (e.g., ICU, floor, ER), lab results (timing by day and hour), physician roles (e.g., surgeon, attending), medications given, and vital signs. The participating community hospitals with bed sizes ranging from 25 to 532 provide unique visibility and assessment of variation in care outside of large academic medical centers and healthcare networks.

Our synthetic data engine is trained on a broadly representative dataset made up of deep clinical information of approximately 6 million unique patient records and 18 million encounters over 5 years of history. Notably, synthetic data generation allows for the creation of any number of records needed to power your project.

EHR data is available in the following formats: — Cleaned, analytics-ready (a layer of clean and normalized concepts in Tuva Health’s standard relational data model format — FHIR USCDI (labs, medications, vitals, encounters, patients, etc.)

The synthetic data maintains full statistical accuracy, yet does not contain any actual patients, thus removing any patient privacy liability risk. Privacy is preserved in a way that goes beyond HIPAA or GDPR compliance. Our industry-leading metrics prove that both privacy and fidelity are fully maintained.

— Generate the data needed for product development, testing, demo, or other needs — Access data at a scalable price point — Build your desired population, both in size and demographics — Scale up and down to fit specific needs, increasing efficiency and affordability

Syntegra's synthetic data engine also has the ability to augment the original data: — Expand population sizes, rare cohorts, or outcomes of interest — Address algorithmic fairness by correcting bias or introducing intentional bias — Conditionally generate data to inform scenario planning — Impute missing value to minimize gaps in the data
Synthea synthetic patient generator data in OMOP Common Data Model
registry.opendata.aws
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amazon Web Sevices (2023). Synthea synthetic patient generator data in OMOP Common Data Model [Dataset]. https://registry.opendata.aws/synthea-omop/
Explore at:
Dataset updated
Jan 4, 2023
Dataset provided by
Amazon.comhttp://amazon.com/
Description
The Synthea generated data is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,800,000 persom (2.8m) data sets in the OMOP Common Data Model format. SyntheaTM is a synthetic patient generator that models the medical history of synthetic patients. Our mission is to output high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government (although a citation would be appreciated). You can read our first academic paper here: https://doi.org/10.1093/jamia/ocx079
Synthetic Data Generation Market Analysis North America, Europe, APAC,...
technavio.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Synthetic Data Generation Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, Germany, UK, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/synthetic-data-generation-market-analysis
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

Synthetic Data Generation Market Size 2024-2028

The synthetic data generation market size is forecast to increase by USD 2.88 billion at a CAGR of 60.02% between 2023 and 2028.

The global synthetic data generation market is expanding steadily, driven by the growing need for privacy-compliant data solutions and advancements in AI technology. Key factors include the increasing demand for data to train machine learning models, particularly in industries like healthcare services and finance where privacy regulations are strict and the use of predictive analytics is critical, and the use of generative AI and machine learning algorithms, which create high-quality synthetic datasets that mimic real-world data without compromising security. This report provides a detailed analysis of the global synthetic data generation market, covering market size, growth forecasts, and key segments such as agent-based modeling and data synthesis. It offers practical insights for business strategy, technology adoption, and compliance planning. A significant trend highlighted is the rise of synthetic data in AI training, enabling faster and more ethical development of models. One major challenge addressed is the difficulty in ensuring data quality, as poorly generated synthetic data can lead to inaccurate outcomes. For businesses aiming to stay competitive in a data-driven global landscape, this report delivers essential data and strategies to leverage synthetic data trends and address quality challenges, ensuring they remain leaders in innovation while meeting regulatory demands

What will be the Size of the Market During the Forecast Period?

Request Free Sample

Synthetic data generation offers a more time-efficient solution compared to traditional methods of data collection and labeling, making it an attractive option for businesses looking to accelerate their AI and machine learning projects. The market represents a promising opportunity for organizations seeking to overcome the challenges of data scarcity and privacy concerns while maintaining data diversity and improving the efficiency of their artificial intelligence and machine learning initiatives. By leveraging this technology, technology decision-makers can drive innovation and gain a competitive edge in their respective industries.

Market Segmentation

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

End-user Healthcare and life sciences Retail and e-commerce Transportation and logistics IT and telecommunication BFSI and others Type Agent-based modelling Direct modelling Data Tabular Data Text Data Image & Video Data Others Offering Band Fully Synthetic Data Partially Synthetic Data Hybrid Synthetic Data Application Data Protection Data Sharing Predictive Analytics Natural Language Processing Computer Vision Algorithms Others Geography North America US Canada Mexico Europe Germany UK France Italy APAC China Japan India Middle East and Africa South America

By End-user Insights

The healthcare and life sciences segment is estimated to witness significant growth during the forecast period. In the thriving healthcare and life sciences sector, synthetic data generation is gaining significant traction as a cost-effective and time-efficient alternative to utilizing real-world data. This market segment's rapid expansion is driven by the increasing demand for data-driven insights and the importance of safeguarding sensitive information. One noteworthy application of synthetic data generation is in the realm of computer vision, specifically with geospatial imagery and medical imaging.

For instance, in healthcare, synthetic data can be generated to replicate medical imaging, such as MRI scans and X-rays, for research and machine learning model development without compromising patient privacy. Similarly, in the field of physical security, synthetic data can be employed to enhance autonomous vehicle simulation, ensuring optimal performance and safety without the need for real-world data. By generating artificial datasets, organizations can diversify their data sources and improve the overall quality and accuracy of their machine learning models.

Get a glance at the share of various segments. Request Free Sample

The healthcare and life sciences segment was valued at USD 12.60 million in 2018 and showed a gradual increase during the forecast period.

Regional Insights

North America is estimated to contribute 36% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the m
E
Bitext Synthetic Data - Healthcare (Spanish language)
catalog.elra.info
Updated Jul 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2023). Bitext Synthetic Data - Healthcare (Spanish language) [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0187/
Explore at:
Dataset updated
Jul 18, 2023
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for each utterance. The Healthcare domain comprises 40 intents for Spanish.Data is distributed as models or open text files.
M
Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034
scoop.market.us
Updated Mar 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.us Scoop (2025). Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034 [Dataset]. https://scoop.market.us/synthetic-data-generation-market-news/
Explore at:
Dataset updated
Mar 18, 2025
Dataset authored and provided by
Market.us Scoop
License
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Synthetic Data Generation Market Size

As per the latest insights from Market.us, the Global Synthetic Data Generation Market is set to reach USD 6,637.98 million by 2034, expanding at a CAGR of 35.7% from 2025 to 2034. The market, valued at USD 313.50 million in 2024, is witnessing rapid growth due to rising demand for high-quality, privacy-compliant, and AI-driven data solutions.

North America dominated in 2024, securing over 35% of the market, with revenues surpassing USD 109.7 million. The regionâ€™s leadership is fueled by strong investments in artificial intelligence, machine learning, and data security across industries such as healthcare, finance, and autonomous systems. With increasing reliance on synthetic data to enhance AI model training and reduce data privacy risks, the market is poised for significant expansion in the coming years.
https://market.us/wp-content/uploads/2025/03/Synthetic-Data-Generation-Market-Size.png" alt="Synthetic Data Generation Market Size" class="wp-image-143209">
d
Medical records of 30K Synthea synthetic patients
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, AJ (2023). Medical records of 30K Synthea synthetic patients [Dataset]. http://doi.org/10.7910/DVN/BWDKXS
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/BWDKXS
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Chen, AJ
Description
The dataset has 2 populations of Synthea synthetic patients generated by Synthea tool. Each population has 15K patients with original medical records in CSV files. Because the total file size is >3GB in each population, the files are compressed in zip file. Synthea records are in domains similar to those in real EMR, including patients, encounters, conditions (diagnosis), observations, medications, and procedures. The data was first used in building ML models for lung cancer risk prediction. For more information, see the published paper in Nature Scientific Reports (https://www.nature.com/articles/s41598-022-23011-4)
R
Synthetic Dataset of Emergency Healthcare Services
datarepositorium.uminho.pt
zenodo.org
csv, txt
Updated Jan 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Repositório de Dados da Universidade do Minho (2025). Synthetic Dataset of Emergency Healthcare Services [Dataset]. http://doi.org/10.34622/datarepositorium/AKSZQG
Explore at:
csv(1259), txt(4064)Available download formats
Unique identifier
https://doi.org/10.34622/datarepositorium/AKSZQG
Dataset updated
Jan 17, 2025
Dataset provided by
Repositório de Dados da Universidade do Minho
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Synthetic dataset of emergency services comprised of several CSV files that we have generated using a simulation software. This dataset is open for public use; please cite our work if used in research or applications. File Overview CheckBloodPressure.csv** - (9 KB): Contains blood pressure Server records of patients. CheckPatientType.csv** - (19 KB): Identifies the type of each patient (e.g., 1 or 3). Fill_Information.csv - (2 KB): Fill information records for new patients. MedicalRecord1.csv - (10 KB): Medical record dataset for patient type 1. MedicalRecord2.csv - (4 KB): Medical record dataset for patient type 2. MedicalRecord3.csv - (2 KB): Medical record dataset for patient type 3. MedicalRecord4.csv - (13 KB): Medical record dataset for patient type 4. OutPatientDepartment.csv - (18 KB): Data related to the satisfaction and length of stay of an given patient. Triage.csv - (13 KB): Data related to the triage process. README.txt - (4 KB): Documentation of the dataset, including structure, metadata, and usage. Common Fields Across Files Patient ID (Integer): Unique identifier for each patient. Patient Type (Integer): Classification of patient (e.g., 1, 4). Medical Records Arrival Time (DateTime): Timestamp of the patient's first arrival in the medical record department. Exiting Time (DateTime): Timestamp when the patient exits a Server. Waiting Time (min) (Real): Total waiting time before being attended to. Resource Used (String): Resource (e.g., Operator) allocated to the patient. Utilization % (Real): Utilization rate of the resource as a percentage. Queue Count Before Processing (Integer): Number of patients in the queue before processing begins. Queue Count After Processing (Integer): Number of patients in the queue after processing ends. Queue Difference (Integer): Difference between the before and after queue counts. Length of Stay (min) (Real): Total time spent in the simulation by the patient. LOS without Queues (min) (Real): Length of stay excluding any queuing time. Satisfaction % (Real): Patient satisfaction rating based on their experience. New Patient? (String): Indicates if this is a new patient or a returning one.
Synthetic Data Solution Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMA Research & Media LLP (2025). Synthetic Data Solution Report [Dataset]. https://www.archivemarketresearch.com/reports/synthetic-data-solution-21817
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Feb 12, 2025
Dataset provided by
AMA Research & Media
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Market Analysis for Synthetic Data Solution The global synthetic data solution market is projected to reach USD XXX million by 2033, growing at a CAGR of XX% from 2025 to 2033. The increasing demand for synthetic data in various industries, such as financial services, retail, and healthcare, drives this growth. Synthetic data offers a privacy-preserving alternative to real-world data, enabling organizations to train and evaluate models without compromising sensitive information. The growing adoption of cloud-based solutions and the increasing need for data privacy and security further contribute to market growth. Market segments include deployment types (cloud-based and on-premises) and applications (financial services industry, retail industry, medical industry, and others). Key regional markets include North America, South America, Europe, Middle East & Africa, and Asia Pacific. Major companies operating in the market include LightWheel AI, Hanyi Innovation Technology, Haohan Data Technology, Haitian Ruisheng Science Technology, and Baidu. Trends such as the adoption of artificial intelligence (AI) and machine learning (ML) and the rising concern over data privacy and governance are expected to shape the market's future.
T
Synthetic Suicide Prevention Dataset with SDoH
datahub.va.gov
data.va.gov
+2more
application/rdfxml +5
Updated Feb 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VHA (2021). Synthetic Suicide Prevention Dataset with SDoH [Dataset]. https://www.datahub.va.gov/dataset/Synthetic-Suicide-Prevention-Dataset-with-SDoH/h5zp-pekf
Explore at:
csv, application/rssxml, application/rdfxml, xml, json, tsvAvailable download formats
Dataset updated
Feb 18, 2021
Dataset authored and provided by
VHA
Description
The included dataset contains 10,000 synthetic Veteran patient records generated by Synthea. The scope of the data includes over 500 clinical concepts across 90 disease modules, as well as additional social determinants of health (SDoH) data elements that are not traditionally tracked in electronic health records. Each synthetic patient conceptually represents one Veteran in the existing US population; each Veteran has a name, sociodemographic profile, a series of documented clinical encounters and diagnoses, as well as associated cost and payer data. To learn more about Synthea, please visit the Synthea wiki at https://github.com/synthetichealth/synthea/wiki. To find a description of how this dataset is organized by data type, please visit the Synthea CSV File Data Dictionary at https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary.The included dataset contains 10,000 synthetic Veteran patient records generated by Synthea. The scope of the data includes over 500 clinical concepts across 90 disease modules, as well as additional social determinants of health (SDoH) data elements that are not traditionally tracked in electronic health records. Each synthetic patient conceptually represents one Veteran in the existing US population; each Veteran has a name, sociodemographic profile, a series of documented clinical encounters and diagnoses, as well as associated cost and payer data. To learn more about Synthea, please visit the Synthea wiki at https://github.com/synthetichealth/synthea/wiki. To find a description of how this dataset is organized by data type, please visit the Synthea CSV File Data Dictionary at https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary.
d
Synthetic version of anonymized Norway Registry data containing...
search.dataone.org
dataverse.azure.uit.no
+2more
Updated Sep 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chauhan, Pavitra (2024). Synthetic version of anonymized Norway Registry data containing prescriptions and hospitalization of the patients [Dataset]. http://doi.org/10.18710/YABAGM
Explore at:
Unique identifier
https://doi.org/10.18710/YABAGM
Dataset updated
Sep 25, 2024
Dataset provided by
DataverseNO
Authors
Chauhan, Pavitra
Time period covered
Jan 1, 2011 - Jan 1, 2013
Description
This dataset represents synthetic data derived from anonymized Norwegian Registry Data of pa aged 65 and above from 2011 to 2013. It includes the Norwegian Patient Registry (NPR), which contains hospitalization details, and the Norwegian Prescription Database (NorPD), which contains prescription details. The NPR and NorPD datasets are combined into a single CSV file. This real dataset was part of a project to study medication use in the elderly and its association with hospitalization. The project has ethical approval from the Regional Committees for Medical and Health Research Ethics in Norway (REK-Nord number: 2014/2182). The dataset was anonymized to ensure that the synthetic version could not reasonably be identical to any real-life individuals. The anonymization process was done as follows: first, only relevant information was kept from the original data set. Second, individuals' birth year and gender were replaced with randomly generated values within a plausible range of values. And last, all dates were replaced with randomly generated dates. This dataset was sufficiently scrambled to generate a synthetic dataset and was only used for the current study. The dataset has details related to Patient, Prescriber, Hospitalization, Diagnosis, Location, Medications, Prescriptions, and Prescriptions dispatched. A publication using this data to create a machine learning model for predicting hospitalization risk is under review.
d
10,000 Synthetic Medicare Patient Records
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hall, Dylan (2023). 10,000 Synthetic Medicare Patient Records [Dataset]. http://doi.org/10.7910/DVN/QDXLWR
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/QDXLWR
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Hall, Dylan
Description
This dataset contains 10,000 synthetic patient records representing a scaled-down US Medicare population. The records were generated by Synthea ( https://github.com/synthetichealth/synthea ) and are completely synthetic and contain no real patient data. This data is presented free of cost and free of restrictions. Each record is stored as one file in HL7 FHIR R4 ( https://www.hl7.org/fhir/ ) containing one Bundle, in JSON. For more information on how this specific population was created, or to generate your own at any scale, see: https://github.com/synthetichealth/populations/tree/master/medicare
EMRBots: a 100-patient database
figshare.com
data.mendeley.com
zip
Updated Sep 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uri Kartoun (2018). EMRBots: a 100-patient database [Dataset]. http://doi.org/10.6084/m9.figshare.7040039.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7040039.v3
Dataset updated
Sep 3, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Uri Kartoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A 100-patient database that contains in total 100 virtual patients, 372 admissions, and 111,483 lab observations.
d
Synthea synthetic patient data for lung cancer risk prediction machine...
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, AJ (2023). Synthea synthetic patient data for lung cancer risk prediction machine learning [Dataset]. http://doi.org/10.7910/DVN/GD5XWE
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/GD5XWE
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Chen, AJ
Description
This dataset contains Synthea synthetic patient data used in building ML models for lung cancer risk prediction. The ML models are used to simulate ML-enabled LHS. This open dataset is part of the synthetic data repository of the Open LHS project on GitHub: https://github.com/lhs-open/synthetic-data. For data source and methods, see the first ML-LHS simulation paper published in Nature Scientific Reports: https://www.nature.com/articles/s41598-022-23011-4.
Veterans Affairs Suicide Prevention Synthetic Dataset
catalog.data.gov
datahub.va.gov
+1more
Updated Jan 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2024). Veterans Affairs Suicide Prevention Synthetic Dataset [Dataset]. https://catalog.data.gov/dataset/veterans-affairs-suicide-prevention-synthetic-dataset
Explore at:
Dataset updated
Jan 28, 2024
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
NOTE: This dataset is no longer supported and is provided as-is. Any historical knowledge regarding meta data or it's creation is no longer available. All known information is proved as part of this data set. The Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The VASPSD was developed using a real, record-level dataset provided through the VA Office of Suicide Prevention. The VASPSD contains no real Veteran information, however, it reflects similar characteristics of the real dataset. NOTICE: This data is intended to appear similar to actual VASPSD data but it does not have any real predictive modeling value. It should not be used in any real world application.
S
Synthetic Data Generation Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2024). Synthetic Data Generation Market Report [Dataset]. https://www.marketresearchforecast.com/reports/synthetic-data-generation-market-1834
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Dec 8, 2024
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Synthetic Data Generation Marketsize was valued at USD 288.5 USD Million in 2023 and is projected to reach USD 1920.28 USD Million by 2032, exhibiting a CAGR of 31.1 % during the forecast period.Synthetic data generation stands for the generation of fake datasets that resemble real datasets with reference to their data distribution and patterns. It refers to the process of creating synthetic data points utilizing algorithms or models instead of conducting observations or surveys. There is one of its core advantages: it can maintain the statistical characteristics of the original data and remove the privacy risk of using real data. Further, with synthetic data, there is no limitation to how much data can be created, and hence, it can be used for extensive testing and training of machine learning models, unlike the case with conventional data, which may be highly regulated or limited in availability. It also helps in the generation of datasets that are comprehensive and include many examples of specific situations or contexts that may occur in practice for improving the AI system’s performance. The use of SDG significantly shortens the process of the development cycle, requiring less time and effort for data collection as well as annotation. It basically allows researchers and developers to be highly efficient in their discovery and development in specific domains like healthcare, finance, etc. Key drivers for this market are: Growing Demand for Data Privacy and Security to Fuel Market Growth. Potential restraints include: Lack of Data Accuracy and Realism Hinders Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
The Health Gym v2.0 Synthetic Antiretroviral Therapy (ART) for HIV Dataset
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Kuo (2023). The Health Gym v2.0 Synthetic Antiretroviral Therapy (ART) for HIV Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22827878.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22827878.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Nicholas Kuo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
===###

This synthetic dataset, centred on ART for HIV, was synthesised employing the model outlined in reference [1], incorporating the techniques of WGAN-GP+G_EOT+VAE+Buffer.

This dataset serves as a principal resource for the Centre for Big Data Research in Health (CBDRH) Datathon (see: CBDRH Health Data Science Datathon 2023 (cbdrh-hds-datathon-2023.github.io)). Its primary purpose is to advance the Health Data Analytics (HDAT) courses at the University of New South Wales (UNSW), providing students with exposure to synthetic yet realistic datasets that simulate real-world data.

The dataset is composed of 534,960 records, distributed over 15 distinct columns, and is preserved in a CSV format with a size of 39.1 MB. It contains information about 8,916 synthetic patients over a period of 60 months, with data summarised on a monthly basis. The total number of records corresponds to the product of the synthetic patient count and the record duration in months, thus equating to 8,916 multiplied by 60.

The dataset's structure encompasses 15 columns, which include 13 variables pertinent to ART for HIV as delineated in reference [1], a unique patient identifier, and a further variable signifying the specific time point.

===

This dataset forms part of a continuous series of work, building upon reference [2]. For further details, kindly refer to our papers: [1] Kuo, Nicholas I., Louisa Jorm, and Sebastiano Barbieri. "Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV." arXiv preprint arXiv:2208.08655 (2022). [2] Kuo, Nicholas I-Hsien, et al. "The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms." Scientific Data 9.1 (2022): 693.

===

Latest edit: 16th May 2023.
Synthetic Hospital Data using AI
kaggle.com
zip
Updated Apr 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy (2024). Synthetic Hospital Data using AI [Dataset]. https://www.kaggle.com/amulyas/synthetic-hospital-data-using-ai
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 13, 2024
Authors
Amy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Trying to utilize AI to make synthetic datasets for students to practice their skills. This data set is clean.
f
Examples of synthetic health datasets and their characteristics.
plos.figshare.com
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aldren Gonzales; Guruprabha Guruswamy; Scott R. Smith (2023). Examples of synthetic health datasets and their characteristics. [Dataset]. http://doi.org/10.1371/journal.pdig.0000082.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000082.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS Digital Health
Authors
Aldren Gonzales; Guruprabha Guruswamy; Scott R. Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Examples of synthetic health datasets and their characteristics.

Facebook

Twitter

Click to copy link

Link copied

Cite

Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr

Synthetic Healthcare Database for Research (SyH-DR)

Explore at:

9 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 16, 2023

Dataset provided by

Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/

Description

The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.

Clear search

Close search

Google apps

Main menu

Synthetic Healthcare Database for Research (SyH-DR)

Australian synthetic healthcare data with Synthea

Syntegra Synthetic EHR Data | Structured Healthcare Electronic Health Record...

Synthea synthetic patient generator data in OMOP Common Data Model

Synthetic Data Generation Market Analysis North America, Europe, APAC,...

Snapshot img

Bitext Synthetic Data - Healthcare (Spanish language)

Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034

Synthetic Data Generation Market Size

Medical records of 30K Synthea synthetic patients

Synthetic Dataset of Emergency Healthcare Services

Synthetic Data Solution Report

Synthetic Suicide Prevention Dataset with SDoH

Synthetic version of anonymized Norway Registry data containing...

10,000 Synthetic Medicare Patient Records

EMRBots: a 100-patient database

Synthea synthetic patient data for lung cancer risk prediction machine...

Veterans Affairs Suicide Prevention Synthetic Dataset

Synthetic Data Generation Market Report

The Health Gym v2.0 Synthetic Antiretroviral Therapy (ART) for HIV Dataset

===###

===

===

Synthetic Hospital Data using AI

Examples of synthetic health datasets and their characteristics.

Synthetic Healthcare Database for Research (SyH-DR)See More Versions

Synthetic Healthcare Database for Research (SyH-DR)