88 datasets found
  1. i

    Dataset of article: Synthetic Datasets Generator for Testing Information...

    • ieee-dataport.org
    Updated Mar 13, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sandro Mendonça (2020). Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools [Dataset]. http://doi.org/10.21227/5aeq-rr34
    Explore at:
    Dataset updated
    Mar 13, 2020
    Dataset provided by
    IEEE Dataport
    Authors
    Sandro Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

  2. f

    Data Sheet 2_Large language models generating synthetic clinical datasets: a...

    • frontiersin.figshare.com
    xlsx
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin (2025). Data Sheet 2_Large language models generating synthetic clinical datasets: a feasibility and comparative analysis with real-world perioperative data.xlsx [Dataset]. http://doi.org/10.3389/frai.2025.1533508.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Austin A. Barr; Joshua Quan; Eddie Guo; Emre Sezgin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundClinical data is instrumental to medical research, machine learning (ML) model development, and advancing surgical care, but access is often constrained by privacy regulations and missing data. Synthetic data offers a promising solution to preserve privacy while enabling broader data access. Recent advances in large language models (LLMs) provide an opportunity to generate synthetic data with reduced reliance on domain expertise, computational resources, and pre-training.ObjectiveThis study aims to assess the feasibility of generating realistic tabular clinical data with OpenAI’s GPT-4o using zero-shot prompting, and evaluate the fidelity of LLM-generated data by comparing its statistical properties to the Vital Signs DataBase (VitalDB), a real-world open-source perioperative dataset.MethodsIn Phase 1, GPT-4o was prompted to generate a dataset with qualitative descriptions of 13 clinical parameters. The resultant data was assessed for general errors, plausibility of outputs, and cross-verification of related parameters. In Phase 2, GPT-4o was prompted to generate a dataset using descriptive statistics of the VitalDB dataset. Fidelity was assessed using two-sample t-tests, two-sample proportion tests, and 95% confidence interval (CI) overlap.ResultsIn Phase 1, GPT-4o generated a complete and structured dataset comprising 6,166 case files. The dataset was plausible in range and correctly calculated body mass index for all case files based on respective heights and weights. Statistical comparison between the LLM-generated datasets and VitalDB revealed that Phase 2 data achieved significant fidelity. Phase 2 data demonstrated statistical similarity in 12/13 (92.31%) parameters, whereby no statistically significant differences were observed in 6/6 (100.0%) categorical/binary and 6/7 (85.71%) continuous parameters. Overlap of 95% CIs were observed in 6/7 (85.71%) continuous parameters.ConclusionZero-shot prompting with GPT-4o can generate realistic tabular synthetic datasets, which can replicate key statistical properties of real-world perioperative data. This study highlights the potential of LLMs as a novel and accessible modality for synthetic data generation, which may address critical barriers in clinical data access and eliminate the need for technical expertise, extensive computational resources, and pre-training. Further research is warranted to enhance fidelity and investigate the use of LLMs to amplify and augment datasets, preserve multivariate relationships, and train robust ML models.

  3. Synthetic Data Generation Market Analysis North America, Europe, APAC,...

    • technavio.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Synthetic Data Generation Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, Germany, UK, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/synthetic-data-generation-market-analysis
    Explore at:
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global
    Description

    Snapshot img

    Synthetic Data Generation Market Size 2024-2028

    The synthetic data generation market size is forecast to increase by USD 2.88 billion at a CAGR of 60.02% between 2023 and 2028.

    The global synthetic data generation market is expanding steadily, driven by the growing need for privacy-compliant data solutions and advancements in AI technology. Key factors include the increasing demand for data to train machine learning models, particularly in industries like healthcare services and finance where privacy regulations are strict and the use of predictive analytics is critical, and the use of generative AI and machine learning algorithms, which create high-quality synthetic datasets that mimic real-world data without compromising security.
    This report provides a detailed analysis of the global synthetic data generation market, covering market size, growth forecasts, and key segments such as agent-based modeling and data synthesis. It offers practical insights for business strategy, technology adoption, and compliance planning. A significant trend highlighted is the rise of synthetic data in AI training, enabling faster and more ethical development of models. One major challenge addressed is the difficulty in ensuring data quality, as poorly generated synthetic data can lead to inaccurate outcomes.
    For businesses aiming to stay competitive in a data-driven global landscape, this report delivers essential data and strategies to leverage synthetic data trends and address quality challenges, ensuring they remain leaders in innovation while meeting regulatory demands
    

    What will be the Size of the Market During the Forecast Period?

    Request Free Sample

    Synthetic data generation offers a more time-efficient solution compared to traditional methods of data collection and labeling, making it an attractive option for businesses looking to accelerate their AI and machine learning projects. The market represents a promising opportunity for organizations seeking to overcome the challenges of data scarcity and privacy concerns while maintaining data diversity and improving the efficiency of their artificial intelligence and machine learning initiatives. By leveraging this technology, technology decision-makers can drive innovation and gain a competitive edge in their respective industries.

    Market Segmentation

    The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    End-user
    
      Healthcare and life sciences
      Retail and e-commerce
      Transportation and logistics
      IT and telecommunication
      BFSI and others
    
    
    Type
    
      Agent-based modelling
      Direct modelling
    
    
    Data
    
      Tabular Data
      Text Data
      Image & Video Data
      Others
    
    
    Offering Band
    
      Fully Synthetic Data
      Partially Synthetic Data
      Hybrid Synthetic Data
    
    
    Application
    
      Data Protection
      Data Sharing
      Predictive Analytics
      Natural Language Processing
      Computer Vision Algorithms
      Others
    
    
    Geography
    
      North America
    
        US
        Canada
        Mexico
    
    
      Europe
    
        Germany
        UK
        France
        Italy
    
    
      APAC
    
        China
        Japan
        India
    
    
      Middle East and Africa
    
    
    
      South America
    

    By End-user Insights

    The healthcare and life sciences segment is estimated to witness significant growth during the forecast period. In the thriving healthcare and life sciences sector, synthetic data generation is gaining significant traction as a cost-effective and time-efficient alternative to utilizing real-world data. This market segment's rapid expansion is driven by the increasing demand for data-driven insights and the importance of safeguarding sensitive information. One noteworthy application of synthetic data generation is in the realm of computer vision, specifically with geospatial imagery and medical imaging.

    For instance, in healthcare, synthetic data can be generated to replicate medical imaging, such as MRI scans and X-rays, for research and machine learning model development without compromising patient privacy. Similarly, in the field of physical security, synthetic data can be employed to enhance autonomous vehicle simulation, ensuring optimal performance and safety without the need for real-world data. By generating artificial datasets, organizations can diversify their data sources and improve the overall quality and accuracy of their machine learning models.

    Get a glance at the share of various segments. Request Free Sample

    The healthcare and life sciences segment was valued at USD 12.60 million in 2018 and showed a gradual increase during the forecast period.

    Regional Insights

    North America is estimated to contribute 36% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the m

  4. Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035...

    • rootsanalysis.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roots Analysis (2024). Synthetic Data Generation Market Size, Share, Trends & Insights Report, 2035 [Dataset]. https://www.rootsanalysis.com/synthetic-data-generation-market
    Explore at:
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Authors
    Roots Analysis
    License

    https://www.rootsanalysis.com/privacy.htmlhttps://www.rootsanalysis.com/privacy.html

    Time period covered
    2021 - 2031
    Area covered
    Global
    Description

    The global synthetic data market size is projected to grow from USD 0.4 billion in the current year to USD 19.22 billion by 2035, representing a CAGR of 42.14%, during the forecast period till 2035

  5. M

    Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034

    • scoop.market.us
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us Scoop (2025). Synthetic Data Generation Market to Surpass USD 6,637.98 Mn By 2034 [Dataset]. https://scoop.market.us/synthetic-data-generation-market-news/
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Market.us Scoop
    License

    https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Synthetic Data Generation Market Size

    As per the latest insights from Market.us, the Global Synthetic Data Generation Market is set to reach USD 6,637.98 million by 2034, expanding at a CAGR of 35.7% from 2025 to 2034. The market, valued at USD 313.50 million in 2024, is witnessing rapid growth due to rising demand for high-quality, privacy-compliant, and AI-driven data solutions.

    North America dominated in 2024, securing over 35% of the market, with revenues surpassing USD 109.7 million. The region’s leadership is fueled by strong investments in artificial intelligence, machine learning, and data security across industries such as healthcare, finance, and autonomous systems. With increasing reliance on synthetic data to enhance AI model training and reduce data privacy risks, the market is poised for significant expansion in the coming years.

    https://market.us/wp-content/uploads/2025/03/Synthetic-Data-Generation-Market-Size.png" alt="Synthetic Data Generation Market Size" class="wp-image-143209">
  6. d

    Synthetic Document Dataset for AI - Jpeg, PNG & PDF formats

    • datarade.ai
    Updated Sep 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ainnotate (2022). Synthetic Document Dataset for AI - Jpeg, PNG & PDF formats [Dataset]. https://datarade.ai/data-products/synthetic-document-dataset-for-ai-jpeg-png-pdf-formats-ainnotate
    Explore at:
    Dataset updated
    Sep 17, 2022
    Dataset authored and provided by
    Ainnotate
    Area covered
    Tonga, Germany, Tokelau, Cabo Verde, Denmark, Ireland, Brazil, Syrian Arab Republic, Korea (Democratic People's Republic of), Canada
    Description

    Ainnotate’s proprietary dataset generation methodology based on large scale generative modelling and Domain randomization provides data that is well balanced with consistent sampling, accommodating rare events, so that it can enable superior simulation and training of your models.

    Ainnotate currently provides synthetic datasets in the following domains and use cases.

    Internal Services - Visa application, Passport validation, License validation, Birth certificates Financial Services - Bank checks, Bank statements, Pay slips, Invoices, Tax forms, Insurance claims and Mortgage/Loan forms Healthcare - Medical Id cards

  7. S

    Synthetic Data Generation Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2024). Synthetic Data Generation Market Report [Dataset]. https://www.marketresearchforecast.com/reports/synthetic-data-generation-market-1834
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Dec 8, 2024
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Synthetic Data Generation Marketsize was valued at USD 288.5 USD Million in 2023 and is projected to reach USD 1920.28 USD Million by 2032, exhibiting a CAGR of 31.1 % during the forecast period.Synthetic data generation stands for the generation of fake datasets that resemble real datasets with reference to their data distribution and patterns. It refers to the process of creating synthetic data points utilizing algorithms or models instead of conducting observations or surveys. There is one of its core advantages: it can maintain the statistical characteristics of the original data and remove the privacy risk of using real data. Further, with synthetic data, there is no limitation to how much data can be created, and hence, it can be used for extensive testing and training of machine learning models, unlike the case with conventional data, which may be highly regulated or limited in availability. It also helps in the generation of datasets that are comprehensive and include many examples of specific situations or contexts that may occur in practice for improving the AI system’s performance. The use of SDG significantly shortens the process of the development cycle, requiring less time and effort for data collection as well as annotation. It basically allows researchers and developers to be highly efficient in their discovery and development in specific domains like healthcare, finance, etc. Key drivers for this market are: Growing Demand for Data Privacy and Security to Fuel Market Growth. Potential restraints include: Lack of Data Accuracy and Realism Hinders Market Growth. Notable trends are: Growing Implementation of Touch-based and Voice-based Infotainment Systems to Increase Adoption of Intelligent Cars.

  8. replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla (2023). replicAnt - Plum2023 - Detection & Tracking Datasets and Trained Networks [Dataset]. http://doi.org/10.5281/zenodo.7849417
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Fabian Plum; Fabian Plum; René Bulla; Hendrik Beck; Hendrik Beck; Natalie Imirzian; Natalie Imirzian; David Labonte; David Labonte; René Bulla
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all recorded and hand-annotated as well as all synthetically generated data as well as representative trained networks used for detection and tracking experiments in the replicAnt - generating annotated images of animals in complex environments using Unreal Engine manuscript. Unless stated otherwise, all 3D animal models used in the synthetically generated data have been generated with the open-source photgrammetry platform scAnt peerj.com/articles/11155/. All synthetic data has been generated with the associated replicAnt project available from https://github.com/evo-biomech/replicAnt.

    Abstract:

    Deep learning-based computer vision methods are transforming animal behavioural research. Transfer learning has enabled work in non-model species, but still requires hand-annotation of example footage, and is only performant in well-defined conditions. To overcome these limitations, we created replicAnt, a configurable pipeline implemented in Unreal Engine 5 and Python, designed to generate large and variable training datasets on consumer-grade hardware instead. replicAnt places 3D animal models into complex, procedurally generated environments, from which automatically annotated images can be exported. We demonstrate that synthetic data generated with replicAnt can significantly reduce the hand-annotation required to achieve benchmark performance in common applications such as animal detection, tracking, pose-estimation, and semantic segmentation; and that it increases the subject-specificity and domain-invariance of the trained networks, so conferring robustness. In some applications, replicAnt may even remove the need for hand-annotation altogether. It thus represents a significant step towards porting deep learning-based computer vision tools to the field.

    Benchmark data

    Two video datasets were curated to quantify detection performance; one in laboratory and one in field conditions. The laboratory dataset consists of top-down recordings of foraging trails of Atta vollenweideri (Forel 1893) leaf-cutter ants. The colony was collected in Uruguay in 2014, and housed in a climate chamber at 25°C and 60% humidity. A recording box was built from clear acrylic, and placed between the colony nest and a box external to the climate chamber, which functioned as feeding site. Bramble leaves were placed in the feeding area prior to each recording session, and ants had access to the recording area at will. The recorded area was 104 mm wide and 200 mm long. An OAK-D camera (OpenCV AI Kit: OAK-D, Luxonis Holding Corporation) was positioned centrally 195 mm above the ground. While keeping the camera position constant, lighting, exposure, and background conditions were varied to create recordings with variable appearance: The “base” case is an evenly lit and well exposed scene with scattered leaf fragments on an otherwise plain white backdrop. A “bright” and “dark” case are characterised by systematic over- or underexposure, respectively, which introduces motion blur, colour-clipped appendages, and extensive flickering and compression artefacts. In a separate well exposed recording, the clear acrylic backdrop was substituted with a printout of a highly textured forest ground to create a “noisy” case. Last, we decreased the camera distance to 100 mm at constant focal distance, effectively doubling the magnification, and yielding a “close” case, distinguished by out-of-focus workers. All recordings were captured at 25 frames per second (fps).

    The field datasets consists of video recordings of Gnathamitermes sp. desert termites, filmed close to the nest entrance in the desert of Maricopa County, Arizona, using a Nikon D850 and a Nikkor 18-105 mm lens on a tripod at camera distances between 20 cm to 40 cm. All video recordings were well exposed, and captured at 23.976 fps.

    Each video was trimmed to the first 1000 frames, and contains between 36 and 103 individuals. In total, 5000 and 1000 frames were hand-annotated for the laboratory- and field-dataset, respectively: each visible individual was assigned a constant size bounding box, with a centre coinciding approximately with the geometric centre of the thorax in top-down view. The size of the bounding boxes was chosen such that they were large enough to completely enclose the largest individuals, and was automatically adjusted near the image borders. A custom-written Blender Add-on aided hand-annotation: the Add-on is a semi-automated multi animal tracker, which leverages blender’s internal contrast-based motion tracker, but also include track refinement options, and CSV export functionality. Comprehensive documentation of this tool and Jupyter notebooks for track visualisation and benchmarking is provided on the replicAnt and BlenderMotionExport GitHub repositories.

    Synthetic data generation

    Two synthetic datasets, each with a population size of 100, were generated from 3D models of \textit{Atta vollenweideri} leaf-cutter ants. All 3D models were created with the scAnt photogrammetry workflow. A “group” population was based on three distinct 3D models of an ant minor (1.1 mg), a media (9.8 mg), and a major (50.1 mg) (see 10.5281/zenodo.7849059)). To approximately simulate the size distribution of A. vollenweideri colonies, these models make up 20%, 60%, and 20% of the simulated population, respectively. A 33% within-class scale variation, with default hue, contrast, and brightness subject material variation, was used. A “single” population was generated using the major model only, with 90% scale variation, but equal material variation settings.

    A Gnathamitermes sp. synthetic dataset was generated from two hand-sculpted models; a worker and a soldier made up 80% and 20% of the simulated population of 100 individuals, respectively with default hue, contrast, and brightness subject material variation. Both 3D models were created in Blender v3.1, using reference photographs.

    Each of the three synthetic datasets contains 10,000 images, rendered at a resolution of 1024 by 1024 px, using the default generator settings as documented in the Generator_example level file (see documentation on GitHub). To assess how the training dataset size affects performance, we trained networks on 100 (“small”), 1,000 (“medium”), and 10,000 (“large”) subsets of the “group” dataset. Generating 10,000 samples at the specified resolution took approximately 10 hours per dataset on a consumer-grade laptop (6 Core 4 GHz CPU, 16 GB RAM, RTX 2070 Super).


    Additionally, five datasets which contain both real and synthetic images were curated. These “mixed” datasets combine image samples from the synthetic “group” dataset with image samples from the real “base” case. The ratio between real and synthetic images across the five datasets varied between 10/1 to 1/100.

    Funding

    This study received funding from Imperial College’s President’s PhD Scholarship (to Fabian Plum), and is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant agreement No. 851705, to David Labonte). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  9. d

    Data from: Generation of synthetic whole-slide image tiles of tumours from...

    • search-dev.test.dataone.org
    • search.dataone.org
    • +2more
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Carrillo-Perez; Marija Pizurica; Yuanning Zheng; Tarak Nath Nandi; Ravi Madduri; Jeanne Shen; Olivier Gevaert (2024). Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models [Dataset]. http://doi.org/10.5061/dryad.6djh9w174
    Explore at:
    Dataset updated
    Apr 12, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Francisco Carrillo-Perez; Marija Pizurica; Yuanning Zheng; Tarak Nath Nandi; Ravi Madduri; Jeanne Shen; Olivier Gevaert
    Time period covered
    Jan 1, 2023
    Description

    Data scarcity presents a significant obstacle in the field of biomedicine, where acquiring diverse and sufficient datasets can be costly and challenging. Synthetic data generation offers a potential solution to this problem by expanding dataset sizes, thereby enabling the training of more robust and generalizable machine learning models. Although previous studies have explored synthetic data generation for cancer diagnosis, they have predominantly focused on single-modality settings, such as whole-slide image tiles or RNA-Seq data. To bridge this gap, we propose a novel approach, RNA-Cascaded-Diffusion-Model or RNA-CDM, for performing RNA-to-image synthesis in a multi-cancer context, drawing inspiration from successful text-to-image synthesis models used in natural images. In our approach, we employ a variational auto-encoder to reduce the dimensionality of a patient’s gene expression profile, effectively distinguishing between different types of cancer. Subsequently, we employ a cascad..., , , # RNA-CDM Generated One Million Synthetic Images

    https://doi.org/10.5061/dryad.6djh9w174

    One million synthetic digital pathology images were generated using the RNA-CDM model presented in the paper "RNA-to-image multi-cancer synthesis using cascaded diffusion models".

    Description of the data and file structure

    There are ten different h5 files per cancer type (TCGA-CESC, TCGA-COAD, TCGA-KIRP, TCGA-GBM, TCGA-LUAD). Each h5 file contains 20.000 images. The key is the tile number, ranging from 0-20,000 in the first file, and from 180,000-200,000 in the last file. The tiles are saved as numpy arrays.

    Code/Software

    The code used to generate this data is available under academic license in https://rna-cdm.stanford.edu .

    Manuscript citation

    Carrillo-Perez, F., Pizurica, M., Zheng, Y. et al. Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models...

  10. S

    Synthetic Data Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Synthetic Data Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/synthetic-data-tool-38973
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global synthetic data tool market is projected to reach USD 10,394.0 million by 2033, exhibiting a CAGR of 34.8% during the forecast period. The growing adoption of AI and ML technologies, increasing demand for data privacy and security, and the rising need for data for training and testing machine learning models are the key factors driving market growth. Additionally, the availability of open-source synthetic data generation tools and the increasing adoption of cloud-based synthetic data platforms are further contributing to market growth. North America is expected to hold the largest market share during the forecast period due to the early adoption of AI and ML technologies and the presence of key vendors in the region. Europe is anticipated to witness significant growth due to increasing government initiatives to promote AI adoption and the growing data privacy concerns. The Asia Pacific region is projected to experience rapid growth due to government initiatives to develop AI capabilities and the increasing adoption of AI and ML technologies in various industries, namely healthcare, retail, and manufacturing.

  11. NOVA: Rendering Virtual Worlds with Humans for Computer Vision Tasks

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jul 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Kerim; Abdulrahman Kerim; Cem Aslan; Ufuk Celikcan; Ufuk Celikcan; Erkut Erdem; Erkut Erdem; Aykut Erdem; Aykut Erdem; Cem Aslan (2021). NOVA: Rendering Virtual Worlds with Humans for Computer Vision Tasks [Dataset]. http://doi.org/10.1111/cgf.14271
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 3, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abdulrahman Kerim; Abdulrahman Kerim; Cem Aslan; Ufuk Celikcan; Ufuk Celikcan; Erkut Erdem; Erkut Erdem; Aykut Erdem; Aykut Erdem; Cem Aslan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Today, the cutting edge of computer vision research greatly depends on the availability of large datasets, which are critical for effectively training and testing new methods. Manually annotating visual data, however, is not only a labor-intensive process but also prone to errors. In this study, we present NOVA, a versatile framework to create realistic-looking 3D rendered worlds containing procedurally generated humans with rich pixel-level ground truth annotations. NOVA can simulate various environmental factors such as weather conditions or different times of day, and bring an exceptionally diverse set of humans to life, each having a distinct body shape, gender and age. To demonstrate NOVA's capabilities, we generate two synthetic datasets for person tracking. The first one includes 108 sequences, each with different levels of difficulty like tracking in crowded scenes or at nighttime and aims for testing the limits of current state-of-the-art trackers. A second dataset of 97 sequences with normal weather conditions is used to show how our synthetic sequences can be utilized to train and boost the performance of deep-learning based trackers. Our results indicate that the synthetic data generated by NOVA represents a good proxy of the real-world and can be exploited for computer vision tasks.

  12. U

    U.S. AI Training Dataset Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2024). U.S. AI Training Dataset Market Report [Dataset]. https://www.archivemarketresearch.com/reports/us-ai-training-dataset-market-4957
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Dec 11, 2024
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    United States
    Variables measured
    Market Size
    Description

    The U.S. AI Training Dataset Market size was valued at USD 590.4 million in 2023 and is projected to reach USD 1880.70 million by 2032, exhibiting a CAGR of 18.0 % during the forecasts period. The U. S. AI training dataset market deals with the generation, selection, and organization of datasets used in training artificial intelligence. These datasets contain the requisite information that the machine learning algorithms need to infer and learn from. Conducts include the advancement and improvement of AI solutions in different fields of business like transport, medical analysis, computing language, and money related measurements. The applications include training the models for activities such as image classification, predictive modeling, and natural language interface. Other emerging trends are the change in direction of more and better-quality, various and annotated data for the improvement of model efficiency, synthetic data generation for data shortage, and data confidentiality and ethical issues in dataset management. Furthermore, due to arising technologies in artificial intelligence and machine learning, there is a noticeable development in building and using the datasets. Recent developments include: In February 2024, Google struck a deal worth USD 60 million per year with Reddit that will give the former real-time access to the latter’s data and use Google AI to enhance Reddit’s search capabilities. , In February 2024, Microsoft announced around USD 2.1 billion investment in Mistral AI to expedite the growth and deployment of large language models. The U.S. giant is expected to underpin Mistral AI with Azure AI supercomputing infrastructure to provide top-notch scale and performance for AI training and inference workloads. .

  13. Bioinformatics Simulated

    • kaggle.com
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2025). Bioinformatics Simulated [Dataset]. http://doi.org/10.34740/kaggle/dsv/10398445
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Kaggle
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification. The dataset includes the following columns: ID_Protein, a unique identifier for each protein; Sequence, a string of amino acids; Molecular_Weight, molecular weight calculated from the sequence; Isoelectric_Point, estimated isoelectric point based on the sequence composition; Hydrophobicity, average hydrophobicity calculated from the sequence; Total_Charge, sum of the charges of the amino acids in the sequence; Polar_Proportion, percentage of polar amino acids in the sequence; Nonpolar_Proportion, percentage of nonpolar amino acids in the sequence; Sequence_Length, total number of amino acids in the sequence; and Class, the functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other. While this is a simulated dataset, it was inspired by patterns observed in real protein datasets such as UniProt, a comprehensive database of protein sequences and annotations; the Kyte-Doolittle Scale, calculations of hydrophobicity; and Biopython, a tool for analyzing biological sequences. This dataset is ideal for training classification models for proteins, exploratory analysis of physicochemical properties of proteins, and building machine learning pipelines in bioinformatics. The dataset was created through sequence generation, where amino acid chains were randomly generated with lengths between 50 and 300 residues, property calculation using the Biopython library, and class assignment with classes randomly assigned for classification purposes. However, the sequences and properties do not represent real proteins but follow patterns observed in natural proteins, and the functional classes are simulated and do not correspond to actual biological characteristics. The dataset is divided into two subsets: Training, which includes 16,000 samples (proteinas_train.csv), and Testing, which includes 4,000 samples (proteinas_test.csv). This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

  14. g

    Synthetic datasets

    • generated.photos
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Generated Media, Inc. (2024). Synthetic datasets [Dataset]. https://generated.photos/datasets
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset authored and provided by
    Generated Media, Inc.
    Description

    100% synthetic. Based on model-released photos. Can be used for any purpose except for the ones violating the law. Worldwide. Different backgrounds: colored, transparent, photographic. Diversity: ethnicity, demographics, facial expressions, and poses.

  15. Heat pump COP drop - synthetic faults

    • kaggle.com
    zip
    Updated Feb 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathieu Vallee (2023). Heat pump COP drop - synthetic faults [Dataset]. https://www.kaggle.com/datasets/mathieuvallee/ai-dhc-heatpump-cop
    Explore at:
    zip(68378018 bytes)Available download formats
    Dataset updated
    Feb 28, 2023
    Authors
    Mathieu Vallee
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains data generated in the AI DHC project.

    This dataset contains synthetic fault data for decrease of the COP of a heat pump

    The IEA DHC Annex XIII project “Artificial Intelligence for Failure Detection and Forecasting of Heat Production and Heat demand in District Heating Networks” is developing Artificial Intelligence (AI) methods for forecasting heat demand and heat production and is evaluating algorithms for detecting faults which can be used by interested stakeholders (operators, suppliers of DHC components and manufacturers of control devices).

    See https://github.com/mathieu-vallee/ai-dhc for the models and pythons scripts used to generate the dataset

    Please cite this dataset as: Vallee, M., Wissocq T., Gaoua Y., Lamaison N., Generation and Evaluation of a Synthetic Dataset to improve Fault Detection in District Heating and Cooling Systems, 2023 (under review at the Energy journal)

    Disclaimer notice (IEA DHC): This project has been independently funded by the International Energy Agency Technology Collaboration Programme on District Heating and Cooling including Combined Heat and Power (IEA DHC).

    Any views expressed in this publication are not necessarily those of IEA DHC.

    IEA DHC can take no responsibility for the use of the information within this publication, nor for any errors or omissions it may contain.

    Information contained herein have been compiled or arrived from sources believed to be reliable. Nevertheless, the authors or their organizations do not accept liability for any loss or damage arising from the use thereof. Using the given information is strictly your own responsibility.

    Disclaimer Notice (Authors):

    This publication has been compiled with reasonable skill and care. However, neither the authors nor the DHC Contracting Parties (of the International Energy Agency Technology Collaboration Programme on District Heating & Cooling) make any representation as to the adequacy or accuracy of the information contained herein, or as to its suitability for any particular application, and accept no responsibility or liability arising out of the use of this publication. The information contained herein does not supersede the requirements given in any national codes, regulations or standards, and should not be regarded as a substitute

    Copyright:

    All property rights, including copyright, are vested in IEA DHC. In particular, all parts of this publication may be reproduced, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise only by crediting IEA DHC as the original source. Republishing of this report in another format or storing the report in a public retrieval system is prohibited unless explicitly permitted by the IEA DHC Operating Agent in writing.

  16. A

    Artificial Intelligence Training Dataset Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Artificial Intelligence Training Dataset Report [Dataset]. https://www.archivemarketresearch.com/reports/artificial-intelligence-training-dataset-38645
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Artificial Intelligence (AI) Training Dataset market is projected to reach $1605.2 million by 2033, exhibiting a CAGR of 9.4% from 2025 to 2033. The surge in demand for AI training datasets is driven by the increasing adoption of AI and machine learning technologies in various industries such as healthcare, financial services, and manufacturing. Moreover, the growing need for reliable and high-quality data for training AI models is further fueling the market growth. Key market trends include the increasing adoption of cloud-based AI training datasets, the emergence of synthetic data generation, and the growing focus on data privacy and security. The market is segmented by type (image classification dataset, voice recognition dataset, natural language processing dataset, object detection dataset, and others) and application (smart campus, smart medical, autopilot, smart home, and others). North America is the largest regional market, followed by Europe and Asia Pacific. Key companies operating in the market include Appen, Speechocean, TELUS International, Summa Linguae Technologies, and Scale AI. Artificial Intelligence (AI) training datasets are critical for developing and deploying AI models. These datasets provide the data that AI models need to learn, and the quality of the data directly impacts the performance of the model. The AI training dataset market landscape is complex, with many different providers offering datasets for a variety of applications. The market is also rapidly evolving, as new technologies and techniques are developed for collecting, labeling, and managing AI training data.

  17. P

    CIFAKE: Real and AI-Generated Synthetic Images Dataset

    • paperswithcode.com
    Updated Mar 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CIFAKE: Real and AI-Generated Synthetic Images Dataset [Dataset]. https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images
    Explore at:
    Dataset updated
    Mar 23, 2023
    Description

    The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.

    CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?

    Dataset details The dataset contains two classes - REAL and FAKE. For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4 There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)

    References If you use this dataset, you must cite the following sources

    Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

    Bird, J.J., Lotfi, A. (2023). CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. arXiv preprint arXiv:2303.14126.

    Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2023). The Bird & Lotfi study is a preprint currently available on ArXiv and this description will be updated when the paper is published.

    License This dataset is published under the same MIT license as CIFAR-10:

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  18. Data from: Dataset for the mapping study "What do we mean by GenAI?"

    • zenodo.org
    • produccioncientifica.usal.es
    • +1more
    Updated Jul 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Vázquez-Ingelmo; A. Vázquez-Ingelmo; F. J. García-Peñalvo; F. J. García-Peñalvo (2023). Dataset for the mapping study "What do we mean by GenAI?" [Dataset]. http://doi.org/10.5281/zenodo.8162484
    Explore at:
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    A. Vázquez-Ingelmo; A. Vázquez-Ingelmo; F. J. García-Peñalvo; F. J. García-Peñalvo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports a literature mapping of AI-driven content generation, analyzing 631 solutions published over the last five years to better understand and characterize the Generative Artificial Intelligence landscape. Tools like ChatGPT, Dall-E, or Midjourney have democratized access to Large Language Models, enabling the creation of human-like content. However, the concept 'Generative Artificial Intelligence' lacks a universally accepted definition, leading to potential misunderstandings.

    The study has been published in International Journal of Interactive Multimedia and Artificial Intelligence.

    García-Peñalvo, F. J., & Vázquez-Ingelmo, A. (2023). What do we mean by GenAI? A systematic mapping of the evolution, trends, and techniques involved in Generative AI. International Journal of Interactive Multimedia and Artificial Intelligence, In Press.

  19. D

    Data Collection and Labelling Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Data Collection and Labelling Report [Dataset]. https://www.marketresearchforecast.com/reports/data-collection-and-labelling-33030
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data collection and labeling market is experiencing robust growth, fueled by the escalating demand for high-quality training data in artificial intelligence (AI) and machine learning (ML) applications. The market, estimated at $15 billion in 2025, is projected to achieve a Compound Annual Growth Rate (CAGR) of 25% over the forecast period (2025-2033), reaching approximately $75 billion by 2033. This expansion is primarily driven by the increasing adoption of AI across diverse sectors, including healthcare (medical image analysis, drug discovery), automotive (autonomous driving systems), finance (fraud detection, risk assessment), and retail (personalized recommendations, inventory management). The rising complexity of AI models and the need for more diverse and nuanced datasets are significant contributing factors to this growth. Furthermore, advancements in data annotation tools and techniques, such as active learning and synthetic data generation, are streamlining the data labeling process and making it more cost-effective. However, challenges remain. Data privacy concerns and regulations like GDPR necessitate robust data security measures, adding to the cost and complexity of data collection and labeling. The shortage of skilled data annotators also hinders market growth, necessitating investments in training and upskilling programs. Despite these restraints, the market’s inherent potential, coupled with ongoing technological advancements and increased industry investments, ensures sustained expansion in the coming years. Geographic distribution shows strong concentration in North America and Europe initially, but Asia-Pacific is poised for rapid growth due to increasing AI adoption and the availability of a large workforce. This makes strategic partnerships and global expansion crucial for market players aiming for long-term success.

  20. Artificial Intelligence (AI) Text Generator Market Analysis North America,...

    • technavio.com
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Artificial Intelligence (AI) Text Generator Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, UK, China, India, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ai-text-generator-market-analysis
    Explore at:
    Dataset updated
    Jul 15, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global, United States
    Description

    Snapshot img

    Artificial Intelligence Text Generator Market Size 2024-2028

    The artificial intelligence (AI) text generator market size is forecast to increase by USD 908.2 million at a CAGR of 21.22% between 2023 and 2028.

    The market is experiencing significant growth due to several key trends. One of these trends is the increasing popularity of AI generators in various sectors, including education for e-learning applications. Another trend is the growing importance of speech-to-text technology, which is becoming increasingly essential for improving productivity and accessibility. However, data privacy and security concerns remain a challenge for the market, as generators process and store vast amounts of sensitive information. It is crucial for market participants to address these concerns through strong data security measures and transparent data handling practices to ensure customer trust and compliance with regulations. Overall, the AI generator market is poised for continued growth as it offers significant benefits in terms of efficiency, accuracy, and accessibility.
    

    What will be the Size of the Artificial Intelligence (AI) Text Generator Market During the Forecast Period?

    Request Free Sample

    The market is experiencing significant growth as businesses and organizations seek to automate content creation across various industries. Driven by technological advancements in machine learning (ML) and natural language processing, AI generators are increasingly being adopted for downstream applications in sectors such as education, manufacturing, and e-commerce. 
    Moreover, these systems enable the creation of personalized content for global audiences in multiple languages, providing a competitive edge for businesses in an interconnected Internet economy. However, responsible AI practices are crucial to mitigate risks associated with biased content, misinformation, misuse, and potential misrepresentation.
    

    How is this Artificial Intelligence (AI) Text Generator Industry segmented and which is the largest segment?

    The artificial intelligence (AI) text generator industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

    Component
    
      Solution
      Service
    
    
    Application
    
      Text to text
      Speech to text
      Image/video to text
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        India
    
    
      South America
    
    
    
      Middle East and Africa
    

    By Component Insights

    The solution segment is estimated to witness significant growth during the forecast period.
    

    Artificial Intelligence (AI) text generators have gained significant traction in various industries due to their efficiency and cost-effectiveness in content creation. These solutions utilize machine learning algorithms, such as Deep Neural Networks, to analyze and learn from vast datasets of human-written text. By predicting the most probable word or sequence of words based on patterns and relationships identified In the training data, AIgenerators produce personalized content for multiple languages and global audiences. The application spans across industries, including education, manufacturing, e-commerce, and entertainment & media. In the education industry, AI generators assist in creating personalized learning materials.

    Get a glance at the Artificial Intelligence (AI) Text Generator Industry report of share of various segments Request Free Sample

    The solution segment was valued at USD 184.50 million in 2018 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 33% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regions, Request Free Sample

    The North American market holds the largest share in the market, driven by the region's technological advancements and increasing adoption of AI in various industries. AI text generators are increasingly utilized for content creation, customer service, virtual assistants, and chatbots, catering to the growing demand for high-quality, personalized content in sectors such as e-commerce and digital marketing. Moreover, the presence of tech giants like Google, Microsoft, and Amazon in North America, who are investing significantly in AI and machine learning, further fuels market growth. AI generators employ Machine Learning algorithms, Deep Neural Networks, and Natural Language Processing to generate content in multiple languages for global audiences.

    Market Dynamics

    Our researchers analyzed the data with 2023 as the base year, along with the key drivers, trends, and c

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sandro Mendonça (2020). Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools [Dataset]. http://doi.org/10.21227/5aeq-rr34

Dataset of article: Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools

Explore at:
Dataset updated
Mar 13, 2020
Dataset provided by
IEEE Dataport
Authors
Sandro Mendonça
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset used in the article entitled 'Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools'. These datasets can be used to test several characteristics in machine learning and data processing algorithms.

Search
Clear search
Close search
Google apps
Main menu