100+ datasets found

Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035...
rootsanalysis.com
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roots Analysis (2024). Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/synthetic-data-generation-market
Explore at:
Dataset updated
Sep 28, 2024
Dataset provided by
Authors
Roots Analysis
License
https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html
Time period covered
2021 - 2031
Area covered
Global
Description
The global synthetic data market size is projected to grow from USD 0.4 billion in the current year to USD 19.22 billion by 2035, representing a CAGR of 42.14%, during the forecast period till 2035
S
Synthetic Data Generation Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2024). Synthetic Data Generation Market Report [Dataset]. https://www.marketresearchforecast.com/reports/synthetic-data-generation-market-1834
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Dec 8, 2024
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Synthetic Data Generation Marketsize was valued at USD 288.5 USD Million in 2023 and is projected to reach USD 1920.28 USD Million by 2032, exhibiting a CAGR of 31.1 % during the forecast period.Synthetic data generation stands for the generation of fake datasets that resemble real datasets with reference to their data distribution and patterns. It refers to the process of creating synthetic data points utilizing algorithms or models instead of conducting observations or surveys. There is one of its core advantages: it can maintain the statistical characteristics of the original data and remove the privacy risk of using real data. Further, with synthetic data, there is no limitation to how much data can be created, and hence, it can be used for extensive testing and training of machine learning models, unlike the case with conventional data, which may be highly regulated or limited in availability. It also helps in the generation of datasets that are comprehensive and include many examples of specific situations or contexts that may occur in practice for improving the AI system’s performance. The use of SDG significantly shortens the process of the development cycle, requiring less time and effort for data collection as well as annotation. It basically allows researchers and developers to be highly efficient in their discovery and development in specific domains like healthcare, finance, etc. Key drivers for this market are: Growing Demand for Data Privacy and Security to Fuel Market Growth. Potential restraints include: Lack of Data Accuracy and Realism Hinders Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.
f
Summary of identified synthetic data use cases in health care and examples.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aldren Gonzales; Guruprabha Guruswamy; Scott R. Smith (2023). Summary of identified synthetic data use cases in health care and examples. [Dataset]. http://doi.org/10.1371/journal.pdig.0000082.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000082.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOS Digital Health
Authors
Aldren Gonzales; Guruprabha Guruswamy; Scott R. Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary of identified synthetic data use cases in health care and examples.
D
Department of Transportation Inventory of Artificial Intelligence Use Cases
data.transportation.gov
data.virginia.gov
+1more
application/rdfxml +5
Updated Apr 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Department of Transportation (2024). Department of Transportation Inventory of Artificial Intelligence Use Cases [Dataset]. https://data.transportation.gov/w/anj8-k6f5/m7rw-edbr?cur=Xzq90Bgb1bi&from=YtC0rQlttWT
Explore at:
application/rdfxml, xml, json, csv, application/rssxml, tsvAvailable download formats
Dataset updated
Apr 11, 2024
Dataset authored and provided by
US Department of Transportation
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
This dataset is a list of Department of Transportation (DOT) Artificial Intelligence (AI) use cases.

Artificial intelligence (AI) promises to drive the growth of the United States economy and improve the quality of life of all Americans. Pursuant to Section 5 of Executive Order (EO) 13960, "Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government," Federal agencies are required to inventory their AI use cases and share their inventories with other government agencies and the public.

In accordance with the requirements of EO 13960, this spreadsheet provides the mechanism for federal agencies to create their inaugural AI use case inventories.

https://www.federalregister.gov/documents/2020/12/08/2020-27065/promoting-the-use-of-trustworthy-artificial-intelligence-in-the-federal-government
Main generative AI use cases in financial services worldwide 2023-2024
statista.com
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Main generative AI use cases in financial services worldwide 2023-2024 [Dataset]. https://www.statista.com/statistics/1446225/use-cases-of-ai-in-financial-services-by-business-area/
Explore at:
Dataset updated
May 22, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Generative AI experienced a massive expansion of use cases in financial services during 2024, with customer experience and engagement emerging as the dominant application. A 2024 survey revealed that ** percent of respondents prioritized this area, a dramatic increase from ** percent in the previous year. Report generation, investment research, and document processing also gained significant traction, with over ** percent of firms implementing these applications. Additional use cases included synthetic data generation, code assistance, software development, marketing and sales asset creation, and enterprise research.
USAID Inventory of Artificial Intelligence (AI) Use Cases
catalog.data.gov
Updated Jan 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2025). USAID Inventory of Artificial Intelligence (AI) Use Cases [Dataset]. https://catalog.data.gov/dataset/usaid-inventory-of-artificial-intelligence-ai-use-cases-2023
Explore at:
Dataset updated
Jan 24, 2025
Dataset provided by
United States Agency for International Developmenthttps://usaid.gov/
Description
This data asset contains an inventory of USAID AI use cases.
Top 10 artificial intelligence use cases by cumulative revenue worldwide...
statista.com
Updated Mar 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Top 10 artificial intelligence use cases by cumulative revenue worldwide 2016-2025 [Dataset]. https://www.statista.com/statistics/607835/worldwide-artificial-intelligence-market-leading-use-cases/
Explore at:
Dataset updated
Mar 17, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2016
Area covered
Worldwide
Description
The statistic shows the cumulative revenues from the ten leading artificial intelligence (AI) use cases worldwide, between 2016 and 2025. Over the ten years between 2016 and 2025, AI software for vehicular object detection, identification, and avoidance is expected to generate 9 billion U.S. dollars.
d
Department of Agriculture Inventory of Artificial Intelligence Use Cases
catalog.data.gov
Updated May 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of the Chief Information Officer (2025). Department of Agriculture Inventory of Artificial Intelligence Use Cases [Dataset]. https://catalog.data.gov/dataset/department-of-agriculture-inventory-of-artificial-intelligence-use-cases
Explore at:
Dataset updated
May 8, 2025
Dataset provided by
Office of the Chief Information Officer
Description
This dataset is an inventory of the uses of artificial intelligence (AI) at USDA. The inventory was developed and published as required by OMB M-24-10, "Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence". The inventory attributes were collected in accordance with a data standard established by OMB.
Leading use cases for artificial intelligence (AI) in 5G networks worldwide...
statista.com
Updated Dec 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Leading use cases for artificial intelligence (AI) in 5G networks worldwide in 2024 [Dataset]. https://www.statista.com/statistics/1534876/top-use-cases-of-ai-in-5g-networks/
Explore at:
Dataset updated
Dec 10, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024 - Jun 2024
Area covered
Worldwide
Description
Artificial intelligence (AI) offers a range of benefits for mobile network operators looking to enhance their 5G operations, with a host of potential use cases. Automation and optimization were cited as the leading use cases by operators responding to a 2024 survey, with data analytics and traffic prediction rounding out the top three.
d
Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...
catalog.data.gov
data.virginia.gov
+1more
Updated Jun 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case Raw Maryland Incidents [Dataset]. https://catalog.data.gov/dataset/data-for-artificial-intelligence-data-centric-ai-for-transportation-work-zone-use-case-raw-c24f9
Explore at:
Dataset updated
Jun 16, 2025
Dataset provided by
Federal Highway Administration
Description
Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case proposes a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms and introduces a novel deep learning model to predict the traffic speed and traffic collision likelihood during planned work zone events. This dataset is raw Maryland roadway incident data
f
Data Sheet 1_Large language models generating synthetic clinical datasets: a...
frontiersin.figshare.com
xlsx
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 1_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1533508.s001
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.
SDNist v1.3: Temporal Map Challenge Environment
datasets.ai
data.nist.gov
+1more
0, 23, 5, 8
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2024). SDNist v1.3: Temporal Map Challenge Environment [Dataset]. https://datasets.ai/datasets/sdnist-benchmark-data-and-evaluation-tools-for-data-synthesizers
Explore at:
5, 23, 8, 0Available download formats
Dataset updated
Aug 6, 2024
Dataset authored and provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
SDNist (v1.3) is a set of benchmark data and metrics for the evaluation of synthetic data generators on structured tabular data. This version (1.3) reproduces the challenge environment from Sprints 2 and 3 of the Temporal Map Challenge. These benchmarks are distributed as a simple open-source python package to allow standardized and reproducible comparison of synthetic generator models on real world data and use cases. These data and metrics were developed for and vetted through the NIST PSCR Differential Privacy Temporal Map Challenge, where the evaluation tools, k-marginal and Higher Order Conjunction, proved effective in distinguishing competing models in the competition environment.SDNist is available via pip install: pip install sdnist==1.2.8 for Python >=3.6 or on the USNIST/Github. The sdnist Python module will download data from NIST as necessary, and users are not required to download data manually.
h
Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS
healthdatagateway.org
unknown
Updated Oct 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2023). Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS [Dataset]. https://healthdatagateway.org/dataset/138
Explore at:
unknownAvailable download formats
Dataset updated
Oct 9, 2023
Dataset authored and provided by
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
License
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
Description
Background Acute compartment syndrome (ACS) is an emergency orthopaedic condition wherein a rapid rise in compartmental pressure compromises blood perfusion to the tissues leading to ischaemia and muscle necrosis. This serious condition is often misdiagnosed or associated with significant diagnostic delay, and can lead to limb amputations and death.

The most common causes of ACS are high impact trauma, especially fractures of the lower limbs which account for 40% of ACS cases. ACS is a challenge to diagnose and treat effectively, with differing clinical thresholds being utilised which can result in unnecessary osteotomy. The highly granular synthetic data for over 900 patients with ACS provide the following key parameters to support critical research into this condition:

Patient data (injury type, location, age, sex, pain levels, pre-injury status and comorbidities)

Physiological parameters (intracompartmental pressure, pH, tissue oxygenation, compartment hardness)

Muscle biomarkers (creatine kinase, myoglobin, lactate dehydrogenase)

Blood vessel damage biomarkers (glycocalyx shedding markers, endothelial permeability markers)

PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

Scope: Enabling data-driven research and machine learning models towards improving the diagnosis of Acute compartment syndrome. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes highly granular patient demographics, physiological parameters, muscle biomarkers, blood biomarkers and co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care (timings and admissions), presenting complaint, lab analysis results (eGFR, troponin, CRP, INR, ABG glucose), systolic and diastolic blood pressures, procedures and surgery details.

Available supplementary data: ACS cohort, Matched controls; ambulance, OMOP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
Comprehensive Synthetic Skin Disease Data
kaggle.com
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arif Miah (2025). Comprehensive Synthetic Skin Disease Data [Dataset]. https://www.kaggle.com/datasets/miadul/comprehensive-synthetic-skin-disease-data/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arif Miah
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
*

📂 Dataset Description:

The Askin Disease Dataset is a synthetic dataset generated to support machine learning and data analysis tasks related to dermatological conditions. It contains 34,000 rows and 10 columns, covering various aspects of skin diseases, patient demographics, treatment history, and disease severity.

🌟 Why This Dataset?

Skin diseases are a prevalent health issue affecting millions of people globally. Accurate diagnosis and effective treatment planning are crucial for improving patient outcomes. This dataset provides a comprehensive representation of various skin disease conditions, making it ideal for:
- Classification tasks: Predicting disease type or severity.
- Predictive modeling: Estimating treatment effectiveness.
- Data visualization: Analyzing demographic patterns.
- Exploratory Data Analysis (EDA): Understanding distribution and correlations.
- Healthcare analytics: Gaining insights into treatment efficacy and disease prevalence.

🗃️ Dataset Content:

The dataset contains the following 10 columns:

Patient_ID: Unique identifier for each patient (e.g., P00001).

Age: Age of the patient (range: 18 to 90).

Gender: Gender of the patient (Male/Female).

Skin_Color: The skin tone of the patient (Fair/Medium/Dark).

Disease_Type: The diagnosed skin disease (Eczema, Psoriasis, Acne, Rosacea, Vitiligo, Melanoma).

Severity: The severity level of the disease (Mild, Moderate, Severe).

Duration: Duration of the disease in months (range: 1 to 120).

Affected_Area: The body part affected by the disease (Face, Arms, Legs, Back, Chest, Scalp).

Previous_Treatment: Indicates whether the patient has received prior treatment (Yes/No).

Treatment_Effectiveness: The effectiveness of previous treatments (High, Moderate, Low).

🔥 Key Features:

Balanced Distribution: The dataset is synthetically generated to ensure a balanced distribution of disease types and severity levels.

Comprehensive Coverage: Multiple features capture patient demographics, disease characteristics, and treatment outcomes.

Versatile Applications: Suitable for classification, prediction, clustering, and data visualization tasks.

Data Integrity: Synthetic data eliminates privacy concerns while retaining the structure and characteristics of real-world data.

🚀 Potential Use Cases:

Disease Classification: Using machine learning to classify skin disease types.

Severity Prediction: Predicting the severity level based on demographic and disease characteristics.

Treatment Effectiveness Analysis: Analyzing how previous treatments correlate with disease severity and affected areas.

Health Insights: Gaining insights into how skin color and demographics impact disease prevalence and severity.

🛠️ Recommended Techniques:

Exploratory Data Analysis (EDA) for initial data inspection and visualization.

Machine Learning Algorithms such as Decision Trees, Random Forest, SVM, and Neural Networks for classification tasks.

Data Preprocessing Techniques like handling missing values, encoding categorical data, and scaling numerical values.

Model Evaluation Metrics including accuracy, precision, recall, F1-score, and ROC-AUC.

📈 License:

This dataset is licensed under the CC BY 4.0 License. You are free to use, share, and modify the dataset with proper attribution.

💬 Inspiration:

Can machine learning accurately classify skin disease types based on demographic and clinical features?

How effective are various treatments for different skin conditions?

Can we predict the severity of skin diseases using patient attributes?

📬 Acknowledgments:

This dataset is synthetically generated and does not represent real patient data. It is designed purely for educational and research purposes in machine learning and data analysis.
D
Data Labeling Market Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.
d
Replication Data for: TimeX
search.dataone.org
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Queen, Owen (2023). Replication Data for: TimeX [Dataset]. http://doi.org/10.7910/DVN/B0DEQJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/B0DEQJ
Dataset updated
Dec 16, 2023
Dataset provided by
Harvard Dataverse
Authors
Queen, Owen
Description
Interpreting time series models is uniquely challenging because it requires identifying both the location of time series signals that drive model predictions and their matching to an interpretable temporal pattern. While explainers from other modalities can be applied to time series, their inductive biases do not transfer well to the inherently uninterpretable nature of time series. We present TIMEX, a time series consistency model for training explainers. TIMEX trains an interpretable surrogate to mimic the behavior of a pretrained time series model. It addresses the issue of model faithfulness by introducing model behavior consistency, a novel formulation that preserves relations in the latent space induced by the pretrained model with relations in the latent space induced by TIMEX. TIMEX provides discrete attribution maps and, unlike existing interpretability methods, it learns a latent space of explanations that can be used in various ways, such as to provide landmarks to visually aggregate similar explanations and easily recognize temporal patterns. We evaluate TIMEX on 8 synthetic and real-world datasets and compare its performance against state-of-the-art interpretability methods. We also conduct case studies using physiological time series. Quantitative evaluations demonstrate that TIMEX achieves the highest or second-highest performance in every metric compared to baselines across all datasets. Through case studies, we show that the novel components of TIMEX show potential for training faithful, interpretable models that capture the behavior of pretrained time series models.
Most popular AI workloads in financial services globally 2023-2024
statista.com
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most popular AI workloads in financial services globally 2023-2024 [Dataset]. https://www.statista.com/statistics/1374567/top-ai-use-cases-in-financial-services-global/
Explore at:
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Data analytics maintained its position as the leading AI application among financial services firms in 2024. A 2024 industry survey indicated that ** percent of companies leveraged AI for data analytics, showing modest growth from the previous year. Generative AI experienced the strongest year-over-year adoption increase, becoming the second most widely used AI technology, with more than half of firms either implementing or evaluating the technology. Reflecting this growing embrace of AI solutions, the financial sector's investment in AI technologies continues to surge, with spending projected to reach over ** billion U.S. dollars in 2025 and more than double to *** billion U.S. dollars by 2028. The main benefits of AI in the financial services sector Financial services firms reported that AI delivered the greatest value through operational efficiencies, according to a 2024 industry survey. The technology also provided significant competitive advantages, cited by ** percent of respondents as a key benefit. Enhanced customer experience emerged as the third most important advantage of AI adoption in the sector. Adoption across business segments The integration of AI varies across different areas of financial services. In 2023, operations lead the way with a ** percent adoption rate, closely followed by risk and compliance at ** percent. In customer experience and marketing, voice assistants, chatbots, and conversational AI are the most common AI applications. Meanwhile, financial reporting and accounting dominate AI use in operations and finance.
h
synthetic_pii_finance_multilingual
huggingface.co
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gretel.ai (2024). synthetic_pii_finance_multilingual [Dataset]. https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2024
Dataset provided by
Gretel.ai
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Image generated by DALL-E. See prompt for more details

💼 📊 Synthetic Financial Domain Documents with PII Labels

gretelai/synthetic_pii_finance_multilingual is a dataset of full length synthetic financial documents containing Personally Identifiable Information (PII), generated using Gretel Navigator and released under Apache 2.0. This dataset is designed to assist with the following use cases:

🏷️ Training NER (Named Entity Recognition) models to detect and label PII in… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual.
AI uses for cybersecurity in organizations in selected countries 2019
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). AI uses for cybersecurity in organizations in selected countries 2019 [Dataset]. https://www.statista.com/statistics/1028823/ai-security-use-cases-in-organizations/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2019
Area covered
United States, Australia, Spain, India, Germany, France, Netherlands, United Kingdom, Italy, Sweden
Description
Network security is the most common artificial intelligence (AI) use case for cybersecurity, as ** percent of surveyed IT executives reported the use of AI for this purpose as of 2019. Data security and endpoint security come next with ** percent and ** percent reported use respectively. Phishing: a deceptive form of cyberattacks   Phishing, a form of cyberattack that uses disguised email as a weapon, is ranked as one of the most concerning cyberthreats worldwide. The goal of phishing is to deceive the email recipient into believing that the message is legitimate and convince them to give away a form of their identity – be it their credit card details or business login data. Over 165 thousand unique phishing sites were discovered worldwide in the first quarter of 2020 alone and hundreds of notable brand and legitimate entities were attacked just in the first month of 2020. A slight stall in global cybersecurity spending   Businesses and individuals have been spending on security solution to counter cybercrimes such as phishing attacks. Worldwide spending on cybersecurity has been growing in recent years and is expected to continue to grow in 2020, albeit at a compromised speed due to the impact of the coronavirus (COVID-19) pandemic. Total spending for 2020 is forecast to reach almost ** billion U.S. dollars, as opposed to a previously predicted ** billion.
d
Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...
catalog.data.gov
data.virginia.gov
Updated Jun 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Highway Administration (2025). Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case Raw Maryland Average Annual Daily Traffic 2019 [Dataset]. https://catalog.data.gov/dataset/data-for-artificial-intelligence-data-centric-ai-for-transportation-work-zone-use-case-raw
Explore at:
Dataset updated
Jun 16, 2025
Dataset provided by
Federal Highway Administration
Description
Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case proposes a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms and introduces a novel deep learning model to predict the traffic speed and traffic collision likelihood during planned work zone events. This dataset is raw Maryland 2019 Average Annual Daily Traffic data

Facebook

Twitter

Click to copy link

Link copied

Cite

Roots Analysis (2024). Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/synthetic-data-generation-market

Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035

Explore at:

Dataset updated

Sep 28, 2024

Dataset provided by

Authors

Roots Analysis

License

https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

Time period covered

2021 - 2031

Area covered

Global

Description

The global synthetic data market size is projected to grow from USD 0.4 billion in the current year to USD 19.22 billion by 2035, representing a CAGR of 42.14%, during the forecast period till 2035

Clear search

Close search

Google apps

Main menu

Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035...

Synthetic Data Generation Market Report

Summary of identified synthetic data use cases in health care and examples.

Department of Transportation Inventory of Artificial Intelligence Use Cases

Main generative AI use cases in financial services worldwide 2023-2024

USAID Inventory of Artificial Intelligence (AI) Use Cases

Top 10 artificial intelligence use cases by cumulative revenue worldwide...

Department of Agriculture Inventory of Artificial Intelligence Use Cases

Leading use cases for artificial intelligence (AI) in 5G networks worldwide...

Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...

Data Sheet 1_Large language models generating synthetic clinical datasets: a...

SDNist v1.3: Temporal Map Challenge Environment

Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS

Comprehensive Synthetic Skin Disease Data

📂 Dataset Description:

🌟 Why This Dataset?

🗃️ Dataset Content:

🔥 Key Features:

🚀 Potential Use Cases:

🛠️ Recommended Techniques:

📈 License:

💬 Inspiration:

📬 Acknowledgments:

Data Labeling Market Report

Replication Data for: TimeX

Most popular AI workloads in financial services globally 2023-2024

synthetic_pii_finance_multilingual

AI uses for cybersecurity in organizations in selected countries 2019

Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...

Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035