100+ datasets found
  1. f

    Data Sheet 2_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  2. d

    Synthetic Document Dataset for AI - Jpeg, PNG & PDF formats

    • datarade.ai
    Updated Sep 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ainnotate (2022). Synthetic Document Dataset for AI - Jpeg, PNG & PDF formats [Dataset]. https://datarade.ai/data-products/synthetic-document-dataset-for-ai-jpeg-png-pdf-formats-ainnotate
    Explore at:
    Dataset updated
    Sep 17, 2022
    Dataset authored and provided by
    Ainnotate
    Area covered
    Korea (Democratic People's Republic of), Tonga, Germany, Tokelau, Denmark, Cabo Verde, Ireland, Brazil, Syrian Arab Republic, Canada
    Description

    Ainnotate’s proprietary dataset generation methodology based on large scale generative modelling and Domain randomization provides data that is well balanced with consistent sampling, accommodating rare events, so that it can enable superior simulation and training of your models.

    Ainnotate currently provides synthetic datasets in the following domains and use cases.

    Internal Services - Visa application, Passport validation, License validation, Birth certificates Financial Services - Bank checks, Bank statements, Pay slips, Invoices, Tax forms, Insurance claims and Mortgage/Loan forms Healthcare - Medical Id cards

  3. Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035...

    • rootsanalysis.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roots Analysis (2024). Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/synthetic-data-generation-market
    Explore at:
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Authors
    Roots Analysis
    License

    https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

    Time period covered
    2021 - 2031
    Area covered
    Global
    Description

    The global synthetic data market size is projected to grow from USD 0.4 billion in the current year to USD 19.22 billion by 2035, representing a CAGR of 42.14%, during the forecast period till 2035

  4. M

    Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034

    • scoop.market.us
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us Scoop (2025). Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034 [Dataset]. https://scoop.market.us/synthetic-data-generation-market-news/
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Market.us Scoop
    License

    https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Synthetic Data Generation Market Size

    As per the latest insights from Market.us, the Global Synthetic Data Generation Market is set to reach USD 6,637.98 million by 2034, expanding at a CAGR of 35.7% from 2025 to 2034. The market, valued at USD 313.50 million in 2024, is witnessing rapid growth due to rising demand for high-quality, privacy-compliant, and AI-driven data solutions.

    North America dominated in 2024, securing over 35% of the market, with revenues surpassing USD 109.7 million. The region’s leadership is fueled by strong investments in artificial intelligence, machine learning, and data security across industries such as healthcare, finance, and autonomous systems. With increasing reliance on synthetic data to enhance AI model training and reduce data privacy risks, the market is poised for significant expansion in the coming years.

    https://market.us/wp-content/uploads/2025/03/Synthetic-Data-Generation-Market-Size.png" alt="Synthetic Data Generation Market Size" class="wp-image-143209">
  5. f

    Table1_Enhancing biomechanical machine learning with limited data:...

    • frontiersin.figshare.com
    pdf
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlo Dindorf; Jonas Dully; Jürgen Konradi; Claudia Wolf; Stephan Becker; Steven Simon; Janine Huthwelker; Frederike Werthmann; Johanna Kniepert; Philipp Drees; Ulrich Betz; Michael Fröhlich (2024). Table1_Enhancing biomechanical machine learning with limited data: generating realistic synthetic posture data using generative artificial intelligence.pdf [Dataset]. http://doi.org/10.3389/fbioe.2024.1350135.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    Frontiers
    Authors
    Carlo Dindorf; Jonas Dully; Jürgen Konradi; Claudia Wolf; Stephan Becker; Steven Simon; Janine Huthwelker; Frederike Werthmann; Johanna Kniepert; Philipp Drees; Ulrich Betz; Michael Fröhlich
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objective: Biomechanical Machine Learning (ML) models, particularly deep-learning models, demonstrate the best performance when trained using extensive datasets. However, biomechanical data are frequently limited due to diverse challenges. Effective methods for augmenting data in developing ML models, specifically in the human posture domain, are scarce. Therefore, this study explored the feasibility of leveraging generative artificial intelligence (AI) to produce realistic synthetic posture data by utilizing three-dimensional posture data.Methods: Data were collected from 338 subjects through surface topography. A Variational Autoencoder (VAE) architecture was employed to generate and evaluate synthetic posture data, examining its distinguishability from real data by domain experts, ML classifiers, and Statistical Parametric Mapping (SPM). The benefits of incorporating augmented posture data into the learning process were exemplified by a deep autoencoder (AE) for automated feature representation.Results: Our findings highlight the challenge of differentiating synthetic data from real data for both experts and ML classifiers, underscoring the quality of synthetic data. This observation was also confirmed by SPM. By integrating synthetic data into AE training, the reconstruction error can be reduced compared to using only real data samples. Moreover, this study demonstrates the potential for reduced latent dimensions, while maintaining a reconstruction accuracy comparable to AEs trained exclusively on real data samples.Conclusion: This study emphasizes the prospects of harnessing generative AI to enhance ML tasks in the biomechanics domain.

  6. Synthetic Data Solution Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Synthetic Data Solution Report [Dataset]. https://www.archivemarketresearch.com/reports/synthetic-data-solution-21817
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    AMA Research & Media
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Market Analysis for Synthetic Data Solution The global synthetic data solution market is projected to reach USD XXX million by 2033, growing at a CAGR of XX% from 2025 to 2033. The increasing demand for synthetic data in various industries, such as financial services, retail, and healthcare, drives this growth. Synthetic data offers a privacy-preserving alternative to real-world data, enabling organizations to train and evaluate models without compromising sensitive information. The growing adoption of cloud-based solutions and the increasing need for data privacy and security further contribute to market growth. Market segments include deployment types (cloud-based and on-premises) and applications (financial services industry, retail industry, medical industry, and others). Key regional markets include North America, South America, Europe, Middle East & Africa, and Asia Pacific. Major companies operating in the market include LightWheel AI, Hanyi Innovation Technology, Haohan Data Technology, Haitian Ruisheng Science Technology, and Baidu. Trends such as the adoption of artificial intelligence (AI) and machine learning (ML) and the rising concern over data privacy and governance are expected to shape the market's future.

  7. Distribution of data used when developing AI products South Korea 2023

    • statista.com
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Distribution of data used when developing AI products South Korea 2023 [Dataset]. https://www.statista.com/statistics/1452827/south-korea-share-of-data-used-when-developing-artificial-intelligence-products/
    Explore at:
    Dataset updated
    Sep 19, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2023 - Nov 2023
    Area covered
    South Korea
    Description

    According to a survey of artificial intelligence (AI) companies in South Korea carried out in 2023, nearly 66 percent of the data used when developing AI products and services was private data. On the other hand, public data comprised around 34 percent.

  8. T

    A Study of the Synthetic Data Generation Market by Tabular Data and Direct...

    • futuremarketinsights.com
    pdf
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A Study of the Synthetic Data Generation Market by Tabular Data and Direct Modeling from 2024 to 2034 [Dataset]. https://www.futuremarketinsights.com/reports/synthetic-data-generation-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset authored and provided by
    Future Market Insights
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2024 - 2034
    Area covered
    Worldwide
    Description

    The synthetic data generation market is projected to be worth US$ 300 million in 2024. The market is anticipated to reach US$ 13.0 billion by 2034. The market is further expected to surge at a CAGR of 45.9% during the forecast period 2024 to 2034.

    AttributesKey Insights
    Synthetic Data Generation Market Estimated Size in 2024US$ 300 million
    Projected Market Value in 2034US$ 13.0 billion
    Value-based CAGR from 2024 to 203445.9%

    Country-wise Insights

    CountriesForecast CAGRs from 2024 to 2034
    The United States46.2%
    The United Kingdom47.2%
    China46.8%
    Japan47.0%
    Korea47.3%

    Category-wise Insights

    CategoryCAGR through 2034
    Tabular Data45.7%
    Sandwich Assays45.5%

    Report Scope

    AttributeDetails
    Estimated Market Size in 2024US$ 0.3 billion
    Projected Market Valuation in 2034US$ 13.0 billion
    Value-based CAGR 2024 to 203445.9%
    Forecast Period2024 to 2034
    Historical Data Available for2019 to 2023
    Market AnalysisValue in US$ Billion
    Key Regions Covered
    • North America
    • Latin America
    • Western Europe
    • Eastern Europe
    • South Asia and Pacific
    • East Asia
    • The Middle East & Africa
    Key Market Segments Covered
    • Data Type
    • Modeling Type
    • Offering
    • Application
    • End Use
    • Region
    Key Countries Profiled
    • The United States
    • Canada
    • Brazil
    • Mexico
    • Germany
    • France
    • France
    • Spain
    • Italy
    • Russia
    • Poland
    • Czech Republic
    • Romania
    • India
    • Bangladesh
    • Australia
    • New Zealand
    • China
    • Japan
    • South Korea
    • GCC countries
    • South Africa
    • Israel
    Key Companies Profiled
    • Mostly AI
    • CVEDIA Inc.
    • Gretel Labs
    • Datagen
    • NVIDIA Corporation
    • Synthesis AI
    • Amazon.com, Inc.
    • Microsoft Corporation
    • IBM Corporation
    • Meta

  9. A

    Artificial Intelligence Training Dataset Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Artificial Intelligence Training Dataset Report [Dataset]. https://www.archivemarketresearch.com/reports/artificial-intelligence-training-dataset-38645
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Artificial Intelligence (AI) Training Dataset market is projected to reach $1605.2 million by 2033, exhibiting a CAGR of 9.4% from 2025 to 2033. The surge in demand for AI training datasets is driven by the increasing adoption of AI and machine learning technologies in various industries such as healthcare, financial services, and manufacturing. Moreover, the growing need for reliable and high-quality data for training AI models is further fueling the market growth. Key market trends include the increasing adoption of cloud-based AI training datasets, the emergence of synthetic data generation, and the growing focus on data privacy and security. The market is segmented by type (image classification dataset, voice recognition dataset, natural language processing dataset, object detection dataset, and others) and application (smart campus, smart medical, autopilot, smart home, and others). North America is the largest regional market, followed by Europe and Asia Pacific. Key companies operating in the market include Appen, Speechocean, TELUS International, Summa Linguae Technologies, and Scale AI. Artificial Intelligence (AI) training datasets are critical for developing and deploying AI models. These datasets provide the data that AI models need to learn, and the quality of the data directly impacts the performance of the model. The AI training dataset market landscape is complex, with many different providers offering datasets for a variety of applications. The market is also rapidly evolving, as new technologies and techniques are developed for collecting, labeling, and managing AI training data.

  10. S

    Synthetic Data Generation Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2024). Synthetic Data Generation Market Report [Dataset]. https://www.marketresearchforecast.com/reports/synthetic-data-generation-market-1834
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Dec 8, 2024
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Synthetic Data Generation Marketsize was valued at USD 288.5 USD Million in 2023 and is projected to reach USD 1920.28 USD Million by 2032, exhibiting a CAGR of 31.1 % during the forecast period.Synthetic data generation stands for the generation of fake datasets that resemble real datasets with reference to their data distribution and patterns. It refers to the process of creating synthetic data points utilizing algorithms or models instead of conducting observations or surveys. There is one of its core advantages: it can maintain the statistical characteristics of the original data and remove the privacy risk of using real data. Further, with synthetic data, there is no limitation to how much data can be created, and hence, it can be used for extensive testing and training of machine learning models, unlike the case with conventional data, which may be highly regulated or limited in availability. It also helps in the generation of datasets that are comprehensive and include many examples of specific situations or contexts that may occur in practice for improving the AI system’s performance. The use of SDG significantly shortens the process of the development cycle, requiring less time and effort for data collection as well as annotation. It basically allows researchers and developers to be highly efficient in their discovery and development in specific domains like healthcare, finance, etc. Key drivers for this market are: Growing Demand for Data Privacy and Security to Fuel Market Growth. Potential restraints include: Lack of Data Accuracy and Realism Hinders Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.

  11. S

    Synthetic Data Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Synthetic Data Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/synthetic-data-tool-38973
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global synthetic data tool market is projected to reach USD 10,394.0 million by 2033, exhibiting a CAGR of 34.8% during the forecast period. The growing adoption of AI and ML technologies, increasing demand for data privacy and security, and the rising need for data for training and testing machine learning models are the key factors driving market growth. Additionally, the availability of open-source synthetic data generation tools and the increasing adoption of cloud-based synthetic data platforms are further contributing to market growth. North America is expected to hold the largest market share during the forecast period due to the early adoption of AI and ML technologies and the presence of key vendors in the region. Europe is anticipated to witness significant growth due to increasing government initiatives to promote AI adoption and the growing data privacy concerns. The Asia Pacific region is projected to experience rapid growth due to government initiatives to develop AI capabilities and the increasing adoption of AI and ML technologies in various industries, namely healthcare, retail, and manufacturing.

  12. Extent of AI usage in data-driven efforts among marketers worldwide 2025

    • statista.com
    Updated Mar 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Extent of AI usage in data-driven efforts among marketers worldwide 2025 [Dataset]. https://www.statista.com/statistics/1487841/extent-ai-use-data-driven-marketing/
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 17, 2025 - Jan 20, 2025
    Area covered
    Worldwide
    Description

    According to a global survey among marketing professionals in January 2025, approximately 17 percent reported using artificial intelligence (AI) extensively in their data-driven marketing efforts. Around 39 percent said they integrated AI in select areas, whereas 26 percent were exploring AI, but have not implemented the technology. Some 13 percent reported not having plans to use AI.

  13. U

    U.S. AI Training Dataset Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2024). U.S. AI Training Dataset Market Report [Dataset]. https://www.archivemarketresearch.com/reports/us-ai-training-dataset-market-4957
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    United States
    Variables measured
    Market Size
    Description

    The U.S. AI Training Dataset Market size was valued at USD 590.4 million in 2023 and is projected to reach USD 1880.70 million by 2032, exhibiting a CAGR of 18.0 % during the forecasts period. The U. S. AI training dataset market deals with the generation, selection, and organization of datasets used in training artificial intelligence. These datasets contain the requisite information that the machine learning algorithms need to infer and learn from. Conducts include the advancement and improvement of AI solutions in different fields of business like transport, medical analysis, computing language, and money related measurements. The applications include training the models for activities such as image classification, predictive modeling, and natural language interface. Other emerging trends are the change in direction of more and better-quality, various and annotated data for the improvement of model efficiency, synthetic data generation for data shortage, and data confidentiality and ethical issues in dataset management. Furthermore, due to arising technologies in artificial intelligence and machine learning, there is a noticeable development in building and using the datasets. Recent developments include: In February 2024, Google struck a deal worth USD 60 million per year with Reddit that will give the former real-time access to the latter’s data and use Google AI to enhance Reddit’s search capabilities. , In February 2024, Microsoft announced around USD 2.1 billion investment in Mistral AI to expedite the growth and deployment of large language models. The U.S. giant is expected to underpin Mistral AI with Azure AI supercomputing infrastructure to provide top-notch scale and performance for AI training and inference workloads. .

  14. Data sources used by companies for training AI models South Korea 2023

    • statista.com
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Data sources used by companies for training AI models South Korea 2023 [Dataset]. https://www.statista.com/statistics/1452822/south-korea-data-sources-for-training-artificial-intelligence-models/
    Explore at:
    Dataset updated
    Sep 19, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Sep 2023 - Nov 2023
    Area covered
    South Korea
    Description

    As of 2023, customer data was the leading source of information used to train artificial intelligence (AI) models in South Korea, with nearly 70 percent of surveyed companies answering that way. About 62 percent responded to use existing data within the company when training their AI model.

  15. The Artificial Intelligence in Retail Market size was USD 4951.2 Million in...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). The Artificial Intelligence in Retail Market size was USD 4951.2 Million in 2023 [Dataset]. https://www.cognitivemarketresearch.com/artificial-intelligence-in-retail-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Artificial Intelligence in Retail market size is USD 4951.2 million in 2023and will expand at a compound annual growth rate (CAGR) of 39.50% from 2023 to 2030.

    Enhanced customer personalization to provide viable market output
    Demand for online remains higher in Artificial Intelligence in the Retail market.
    The machine learning and deep learning category held the highest Artificial Intelligence in Retail market revenue share in 2023.
    North American Artificial Intelligence In Retail will continue to lead, whereas the Asia-Pacific Artificial Intelligence In Retail market will experience the most substantial growth until 2030.
    

    Enhanced Customer Personalization to Provide Viable Market Output

    A primary driver of Artificial Intelligence in the Retail market is the pursuit of enhanced customer personalization. A.I. algorithms analyze vast datasets of customer behaviors, preferences, and purchase history to deliver highly personalized shopping experiences. Retailers leverage this insight to offer tailored product recommendations, targeted marketing campaigns, and personalized promotions. The drive for superior customer personalization not only enhances customer satisfaction but also increases engagement and boosts sales. This focus on individualized interactions through A.I. applications is a key driver shaping the dynamic landscape of A.I. in the retail market.

    January 2023 - Microsoft and digital start-up AiFi worked together to offer Smart Store Analytics. It is a cloud-based tracking solution that helps merchants with operational and shopper insights for intelligent, cashierless stores.

    Source-techcrunch.com/2023/01/10/aifi-microsoft-smart-store-analytics/

    Improved Operational Efficiency to Propel Market Growth
    

    Another pivotal driver is the quest for improved operational efficiency within the retail sector. A.I. technologies streamline various aspects of retail operations, from inventory management and demand forecasting to supply chain optimization and cashier-less checkout systems. By automating routine tasks and leveraging predictive analytics, retailers can enhance efficiency, reduce costs, and minimize errors. The pursuit of improved operational efficiency is a key motivator for retailers to invest in AI solutions, enabling them to stay competitive, adapt to dynamic market conditions, and meet the evolving demands of modern consumers in the highly competitive artificial intelligence (AI) retail market.

    January 2023 - The EY Retail Intelligence solution, which is based on Microsoft Cloud, was introduced by the Fintech business EY to give customers a safe and efficient shopping experience. In order to deliver insightful information, this solution makes use of Microsoft Cloud for Retail and its technologies, which include image recognition, analytics, and artificial intelligence (A.I.).

    Source-www.ey.com/en_gl/news/2023/01/ey-announces-launch-of-retail-solution-that-builds-on-the-microsoft-cloud-to-help-achieve-seamless-consumer-shopping-experiences

    Market Dynamics of the Artificial Intelligence in the Retail market

    Data Security Concerns to Restrict Market Growth
    

    A prominent restraint in Artificial Intelligence in the Retail market is the pervasive concern over data security. As retailers increasingly rely on A.I. to process vast amounts of customer data for personalized experiences, there is a growing apprehension regarding the protection of sensitive information. The potential for data breaches and cyberattacks poses a significant challenge, as retailers must navigate the delicate balance between utilizing customer data for AI-driven initiatives and safeguarding it against potential security threats. Addressing these concerns is crucial to building and maintaining consumer trust in A.I. applications within the retail sector.

    Impact of COVID–19 on the Artificial Intelligence in the Retail market

    The COVID-19 pandemic significantly influenced artificial intelligence in the retail market, accelerating the adoption of A.I. technologies across the industry. With lockdowns, social distancing measures, and a surge in online shopping, retailers turned to A.I. to navigate the challenges posed by the pandemic. AI-powered solutions played a crucial role in optimizing supply chain management, predicting shifts in consumer behavior, and enhancing e-commerce experiences. Retailers lever...

  16. d

    Data for Artificial Intelligence: Data-Centric AI for Transportation: Work...

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Nov 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federal Highway Administration (2024). Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case Raw Maryland Incidents [Dataset]. https://catalog.data.gov/dataset/data-for-artificial-intelligence-data-centric-ai-for-transportation-work-zone-use-case-raw-c24f9
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset provided by
    Federal Highway Administration
    Description

    Data for Artificial Intelligence: Data-Centric AI for Transportation: Work Zone Use Case proposes a data integration pipeline that enhances the utilization of work zone and traffic data from diversified platforms and introduces a novel deep learning model to predict the traffic speed and traffic collision likelihood during planned work zone events. This dataset is raw Maryland roadway incident data

  17. MKAD Synthetic Data

    • data.nasa.gov
    • datasets.ai
    • +1more
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). MKAD Synthetic Data [Dataset]. https://data.nasa.gov/dataset/MKAD-Synthetic-Data/smca-ticq
    Explore at:
    application/rssxml, json, csv, application/rdfxml, xml, tsvAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    MKAD
    Description

    Synthetic data used to demonstrate the effectiveness of the MKAD algorithm with respect to detecting anomalies in both the continuous numerical data and binary discrete data.

  18. Cloud Artificial Intelligence (AI) Market Analysis North America, Europe,...

    • technavio.com
    Updated Oct 1, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2002). Cloud Artificial Intelligence (AI) Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/cloud-ai-market-industry-analysis
    Explore at:
    Dataset updated
    Oct 1, 2002
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    United Kingdom, United States, Global
    Description

    Snapshot img

    Cloud Artificial Intelligence (AI) Market Size 2024-2028

    The cloud artificial intelligence (ai) market size is forecast to increase by USD 12.61 billion at a CAGR of 24.1% between 2023 and 2028.

    The market is experiencing significant growth, driven by the emergence of technologically advanced devices and the increasing adoption of 5G and mobile penetration. These factors enable the integration of AI technologies into various applications, leading to improved efficiency and productivity. However, the market also faces challenges from open-source platforms, which offer free AI solutions, making it difficult for market players to compete on price. Despite this, the market is expected to continue its growth trajectory, driven by the increasing demand for AI solutions in various industries, including healthcare, finance, and retail. Organizations are leveraging cloud-based AI solutions to gain insights from their data, automate processes, and enhance customer experiences.The market analysis report provides a comprehensive overview of these trends and challenges, offering valuable insights for stakeholders looking to capitalize on the growth opportunities In the cloud AI market.

    What will be the Size of the Cloud Artificial Intelligence (AI) Market During the Forecast Period?

    Request Free SampleThe market is experiencing robust growth, driven by the increasing adoption of machine learning (ML), deep learning, neural networks, and generative AI technologies. These advanced algorithms are revolutionizing various industries by emulating human intelligence in speech recognition, digital media, diagnostics, cybersecurity, and business decision-making. Hyperscale cloud platforms are becoming the preferred infrastructure for AI applications due to their ability to handle massive data processing requirements. Cloud AI solutions are transforming IT services by automating routine tasks, enhancing data analytics, and improving human capital management. They offer significant cost savings by eliminating the need for expensive hardware and maintenance. Moreover, AI-driven cloud management and data management solutions enable predictive analytics, personalization, productivity, and security enhancements.In addition, AI is playing a pivotal role in threat detection and cybersecurity, ensuring business continuity and data protection. Overall, the cloud AI market is poised for exponential growth, as organizations continue to leverage AI to gain a competitive edge In their respective industries.

    How is this Cloud Artificial Intelligence (AI) Industry segmented and which is the largest segment?

    The cloud artificial intelligence (ai) industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. ComponentSoftwareServicesGeographyNorth AmericaUSEuropeGermanyUKAPACChinaJapanSouth AmericaMiddle East and Africa

    By Component Insights

    The software segment is estimated to witness significant growth during the forecast period.
    

    Artificial Intelligence (AI) software replicates human learning and behavior, revolutionizing various business sectors. AI development involves creating new software or enhancing existing solutions to deliver analytics results and trigger actions based on them. Applications of AI include automating business processes, personalizing services, and generating industry-specific insights. The digitization trend has driven industrial transformations, with healthcare being a prime example. According to BDO's Healthcare Digital Transformation Survey, 93% of US healthcare organizations adopted digital transformation strategies in 2021, integrating AI, computing, and enterprise resource planning software. AI functionality encompasses speech recognition, machine learning (ML), deep learning, neural networks, generative AI, automation, decision-making, and more.Hyperscale cloud platforms, IT services, infrastructure, data analytics, human capital management, cost savings, cloud management, data management, predictive analytics, personalization, productivity, security, threat detection, integration, talent gap, and chatbots are significant AI applications. AI tools process data, power business intelligence, and enable lower costs through ML-based models and GPUs. Enterprise datacenters, virtualization, public clouds, private clouds, and hybrid cloud solutions leverage AI for non-repetitive tasks. AI streamlines workloads, automates repetitive tasks, monitors and manages IT infrastructure, and offers dynamic cloud services. AI is transforming industries, from retail inventory management to financial organizations, providing competitive advantages through cost savings and improved decision-making capabilities.

    Get a glance at the Cloud Artificial Intelligence (AI) Industry repo

  19. SDNist v1.3: Temporal Map Challenge Environment

    • datasets.ai
    • data.nist.gov
    0, 23, 5, 8
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). SDNist v1.3: Temporal Map Challenge Environment [Dataset]. https://datasets.ai/datasets/sdnist-benchmark-data-and-evaluation-tools-for-data-synthesizers
    Explore at:
    5, 23, 8, 0Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    SDNist (v1.3) is a set of benchmark data and metrics for the evaluation of synthetic data generators on structured tabular data. This version (1.3) reproduces the challenge environment from Sprints 2 and 3 of the Temporal Map Challenge. These benchmarks are distributed as a simple open-source python package to allow standardized and reproducible comparison of synthetic generator models on real world data and use cases. These data and metrics were developed for and vetted through the NIST PSCR Differential Privacy Temporal Map Challenge, where the evaluation tools, k-marginal and Higher Order Conjunction, proved effective in distinguishing competing models in the competition environment.SDNist is available via pip install: pip install sdnist==1.2.8 for Python >=3.6 or on the USNIST/Github. The sdnist Python module will download data from NIST as necessary, and users are not required to download data manually.

  20. h

    NEUDev_AI_as_code_evaluator_SyntheticDataset

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hannah (2025). NEUDev_AI_as_code_evaluator_SyntheticDataset [Dataset]. https://huggingface.co/datasets/Hananie/NEUDev_AI_as_code_evaluator_SyntheticDataset
    Explore at:
    Dataset updated
    Mar 26, 2025
    Authors
    Hannah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This synthetic dataset was generated using GPT-3.5 Turbo and contains programming challenges in Python, Java, and C#. Each entry in the dataset includes:

    language: The programming language of the solution (Python, Java, or C#) question: The coding problem or challenge description solution: A model-generated solution to the problem label: A quality label indicating if the solution is efficient, inefficient, or buggy comment: Model-generated feedback explaining the… See the full description on the dataset page: https://huggingface.co/datasets/Hananie/NEUDev_AI_as_code_evaluator_SyntheticDataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002

Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Feb 5, 2025
Dataset provided by
Frontiers
Authors
Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

Search
Clear search
Close search
Google apps
Main menu