8 datasets found
  1. Data archive for paper "Copula-based synthetic data augmentation for...

    • zenodo.org
    zip
    Updated Mar 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Meyer; David Meyer (2022). Data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators" [Dataset]. http://doi.org/10.5281/zenodo.5150327
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 15, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Meyer; David Meyer
    Description

    Overview

    This is the data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators". It contains the paper’s data archive with model outputs (see results folder) and the Singularity image for (optionally) re-running experiments.

    For the Python tool used to generate synthetic data, please refer to Synthia.

    Requirements

    *Although PBS in not a strict requirement, it is required to run all helper scripts as included in this repository. Please note that depending on your specific system settings and resource availability, you may need to modify PBS parameters at the top of submit scripts stored in the hpc directory (e.g. #PBS -lwalltime=72:00:00).

    Usage

    To reproduce the results from the experiments described in the paper, first fit all copula models to the reduced NWP-SAF dataset with:

    qsub hpc/fit.sh

    then, to generate synthetic data, run all machine learning model configurations, and compute the relevant statistics use:

    qsub hpc/stats.sh
    qsub hpc/ml_control.sh
    qsub hpc/ml_synth.sh

    Finally, to plot all artifacts included in the paper use:

    qsub hpc/plot.sh

    Licence

    Code released under MIT license. Data from the reduced NWP-SAF dataset released under CC BY 4.0.

  2. Synthetic Data Generation Market Analysis, Size, and Forecast 2025-2029:...

    • technavio.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Synthetic Data Generation Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Italy, and UK), APAC (China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/synthetic-data-generation-market-analysis
    Explore at:
    Dataset updated
    May 6, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global, United States
    Description

    Snapshot img

    Synthetic Data Generation Market Size 2025-2029

    The synthetic data generation market size is forecast to increase by USD 4.39 billion, at a CAGR of 61.1% between 2024 and 2029.

    The market is experiencing significant growth, driven by the escalating demand for data privacy protection. With increasing concerns over data security and the potential risks associated with using real data, synthetic data is gaining traction as a viable alternative. Furthermore, the deployment of large language models is fueling market expansion, as these models can generate vast amounts of realistic and diverse data, reducing the reliance on real-world data sources. However, high costs associated with high-end generative models pose a challenge for market participants. These models require substantial computational resources and expertise to develop and implement effectively. Companies seeking to capitalize on market opportunities must navigate these challenges by investing in research and development to create more cost-effective solutions or partnering with specialists in the field. Overall, the market presents significant potential for innovation and growth, particularly in industries where data privacy is a priority and large language models can be effectively utilized.

    What will be the Size of the Synthetic Data Generation Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market continues to evolve, driven by the increasing demand for data-driven insights across various sectors. Data processing is a crucial aspect of this market, with a focus on ensuring data integrity, privacy, and security. Data privacy-preserving techniques, such as data masking and anonymization, are essential in maintaining confidentiality while enabling data sharing. Real-time data processing and data simulation are key applications of synthetic data, enabling predictive modeling and data consistency. Data management and workflow automation are integral components of synthetic data platforms, with cloud computing and model deployment facilitating scalability and flexibility. Data governance frameworks and compliance regulations play a significant role in ensuring data quality and security. Deep learning models, variational autoencoders (VAEs), and neural networks are essential tools for model training and optimization, while API integration and batch data processing streamline the data pipeline. Machine learning models and data visualization provide valuable insights, while edge computing enables data processing at the source. Data augmentation and data transformation are essential techniques for enhancing the quality and quantity of synthetic data. Data warehousing and data analytics provide a centralized platform for managing and deriving insights from large datasets. Synthetic data generation continues to unfold, with ongoing research and development in areas such as federated learning, homomorphic encryption, statistical modeling, and software development. The market's dynamic nature reflects the evolving needs of businesses and the continuous advancements in data technology.

    How is this Synthetic Data Generation Industry segmented?

    The synthetic data generation industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. End-userHealthcare and life sciencesRetail and e-commerceTransportation and logisticsIT and telecommunicationBFSI and othersTypeAgent-based modellingDirect modellingApplicationAI and ML Model TrainingData privacySimulation and testingOthersProductTabular dataText dataImage and video dataOthersGeographyNorth AmericaUSCanadaMexicoEuropeFranceGermanyItalyUKAPACChinaIndiaJapanRest of World (ROW)

    By End-user Insights

    The healthcare and life sciences segment is estimated to witness significant growth during the forecast period.In the rapidly evolving data landscape, the market is gaining significant traction, particularly in the healthcare and life sciences sector. With a growing emphasis on data-driven decision-making and stringent data privacy regulations, synthetic data has emerged as a viable alternative to real data for various applications. This includes data processing, data preprocessing, data cleaning, data labeling, data augmentation, and predictive modeling, among others. Medical imaging data, such as MRI scans and X-rays, are essential for diagnosis and treatment planning. However, sharing real patient data for research purposes or training machine learning algorithms can pose significant privacy risks. Synthetic data generation addresses this challenge by producing realistic medical imaging data, ensuring data privacy while enabling research

  3. S

    Synthetic Data Solution Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Synthetic Data Solution Report [Dataset]. https://www.marketreportanalytics.com/reports/synthetic-data-solution-54761
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Apr 3, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The synthetic data solution market is experiencing robust growth, driven by increasing demand for data privacy compliance (GDPR, CCPA), the need for large, diverse datasets for AI/ML model training, and the rising costs and difficulties associated with obtaining real-world data. The market, currently estimated at $2 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated $12 billion by 2033. This expansion is fueled by several key trends, including the maturation of synthetic data generation techniques, the increasing adoption of cloud-based solutions offering scalability and cost-effectiveness, and the growing recognition of synthetic data's crucial role in overcoming data bias and enhancing model accuracy. Key application areas driving this growth are financial services, where synthetic data helps in fraud detection and risk management, and the retail sector, benefiting from improved customer segmentation and personalized marketing strategies. The medical industry also presents a significant opportunity, with synthetic data enabling the development of innovative diagnostic tools and personalized treatments while protecting patient privacy. The competitive landscape is dynamic, with established players like Baidu competing alongside innovative startups such as LightWheel AI and Hanyi Innovation Technology. While the North American market currently holds a significant share, the Asia-Pacific region, particularly China and India, is poised for substantial growth due to increasing digitalization and the burgeoning AI market. Challenges remain, however, including the need to ensure the quality and realism of synthetic data and the ongoing development of robust validation and verification methods. Overcoming these hurdles will be crucial to unlocking the full potential of this rapidly evolving market. On-premises solutions are currently more prevalent, but the shift towards cloud-based solutions is expected to accelerate, driven by the benefits of scalability and accessibility.

  4. S

    Synthetic Data Generation Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Synthetic Data Generation Report [Dataset]. https://www.datainsightsmarket.com/reports/synthetic-data-generation-1124388
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Jun 16, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The synthetic data generation market is experiencing explosive growth, driven by the increasing need for high-quality data in various applications, including AI/ML model training, data privacy compliance, and software testing. The market, currently estimated at $2 billion in 2025, is projected to experience a Compound Annual Growth Rate (CAGR) of 25% from 2025 to 2033, reaching an estimated $10 billion by 2033. This significant expansion is fueled by several key factors. Firstly, the rising adoption of artificial intelligence and machine learning across industries demands large, high-quality datasets, often unavailable due to privacy concerns or data scarcity. Synthetic data provides a solution by generating realistic, privacy-preserving datasets that mirror real-world data without compromising sensitive information. Secondly, stringent data privacy regulations like GDPR and CCPA are compelling organizations to explore alternative data solutions, making synthetic data a crucial tool for compliance. Finally, the advancements in generative AI models and algorithms are improving the quality and realism of synthetic data, expanding its applicability in various domains. Major players like Microsoft, Google, and AWS are actively investing in this space, driving further market expansion. The market segmentation reveals a diverse landscape with numerous specialized solutions. While large technology firms dominate the broader market, smaller, more agile companies are making significant inroads with specialized offerings focused on specific industry needs or data types. The geographical distribution is expected to be skewed towards North America and Europe initially, given the high concentration of technology companies and early adoption of advanced data technologies. However, growing awareness and increasing data needs in other regions are expected to drive substantial market growth in Asia-Pacific and other emerging markets in the coming years. The competitive landscape is characterized by a mix of established players and innovative startups, leading to continuous innovation and expansion of market applications. This dynamic environment indicates sustained growth in the foreseeable future, driven by an increasing recognition of synthetic data's potential to address critical data challenges across industries.

  5. f

    Supplementary file 1_Data augmented lung cancer prediction framework using...

    • frontiersin.figshare.com
    docx
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifan Jiang; Venkata S. K. Manem (2025). Supplementary file 1_Data augmented lung cancer prediction framework using the nested case control NLST cohort.docx [Dataset]. http://doi.org/10.3389/fonc.2025.1492758.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    Frontiers
    Authors
    Yifan Jiang; Venkata S. K. Manem
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PurposeIn the context of lung cancer screening, the scarcity of well-labeled medical images poses a significant challenge to implement supervised learning-based deep learning methods. While data augmentation is an effective technique for countering the difficulties caused by insufficient data, it has not been fully explored in the context of lung cancer screening. In this research study, we analyzed the state-of-the-art (SOTA) data augmentation techniques for lung cancer binary prediction.MethodsTo comprehensively evaluate the efficiency of data augmentation approaches, we considered the nested case control National Lung Screening Trial (NLST) cohort comprising of 253 individuals who had the commonly used CT scans without contrast. The CT scans were pre-processed into three-dimensional volumes based on the lung nodule annotations. Subsequently, we evaluated five basic (online) and two generative model-based offline data augmentation methods with ten state-of-the-art (SOTA) 3D deep learning-based lung cancer prediction models.ResultsOur results demonstrated that the performance improvement by data augmentation was highly dependent on approach used. The Cutmix method resulted in the highest average performance improvement across all three metrics: 1.07%, 3.29%, 1.19% for accuracy, F1 score and AUC, respectively. MobileNetV2 with a simple data augmentation approach achieved the best AUC of 0.8719 among all lung cancer predictors, demonstrating a 7.62% improvement compared to baseline. Furthermore, the MED-DDPM data augmentation approach was able to improve prediction performance by rebalancing the training set and adding moderately synthetic data.ConclusionsThe effectiveness of online and offline data augmentation methods were highly sensitive to the prediction model, highlighting the importance of carefully selecting the optimal data augmentation method. Our findings suggest that certain traditional methods can provide more stable and higher performance compared to SOTA online data augmentation approaches. Overall, these results offer meaningful insights for the development and clinical integration of data augmented deep learning tools for lung cancer screening.

  6. H

    Healthcare Data Collection and Labeling Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Healthcare Data Collection and Labeling Report [Dataset]. https://www.datainsightsmarket.com/reports/healthcare-data-collection-and-labeling-976710
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global healthcare data collection and labeling market is experiencing robust growth, driven by the increasing adoption of artificial intelligence (AI) and machine learning (ML) in healthcare. The rising volume of patient data generated through electronic health records (EHRs), wearable devices, and medical imaging necessitates efficient and accurate data labeling for training sophisticated AI algorithms. This demand fuels the market's expansion. While precise market sizing figures require further details, a reasonable estimate, considering the current growth trajectory of related AI and healthcare sectors, would place the 2025 market value at approximately $2 billion, with a Compound Annual Growth Rate (CAGR) of 15-20% projected through 2033. Key drivers include the need for improved diagnostic accuracy, personalized medicine, and drug discovery, all heavily reliant on high-quality labeled datasets. Furthermore, regulatory compliance mandates around data privacy and security are indirectly driving the adoption of specialized data collection and labeling services, ensuring data integrity and patient confidentiality. The market is segmented based on data type (imaging, text, sensor data), labeling method (supervised, unsupervised, semi-supervised), service type (data annotation, data augmentation, model training), and end-user (hospitals, pharmaceutical companies, research institutions). Companies like Alegion, Appen, and iMerit are key players, offering a range of services to meet diverse healthcare data needs. However, challenges remain, including data heterogeneity, scalability concerns related to large datasets, and the potential for bias in labeled data. Addressing these challenges requires continuous innovation in data collection methodologies, advanced labeling techniques, and the development of robust quality control measures. Future market growth will hinge on the successful integration of advanced technologies like synthetic data generation and automated labeling tools, aiming to reduce costs and accelerate the development of AI-powered healthcare solutions.

  7. XIMAGENET-12: An Explainable AI Benchmark CVPR2024

    • kaggle.com
    Updated Sep 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anomly (2023). XIMAGENET-12: An Explainable AI Benchmark CVPR2024 [Dataset]. http://doi.org/10.34740/kaggle/ds/3123294
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anomly
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Introduction:

    XimageNet-12https://qiangli.de/imgs/flowchart2%20(1).png">

    🌟 XimageNet-12 🌟

    An Explainable Visual Benchmark Dataset for Robustness Evaluation. A Dataset for Image Background Exploration!

    Blur Background, Segmented Background, AI-generated Background, Bias of Tools During Annotation, Color in Background, Random Background with Real Environment

    +⭐ Follow Authors for project updates.

    Website: XimageNet-12

    Here, we trying to understand how image background effect the Computer Vision ML model, on topics such as Detection and Classification, based on baseline Li et.al work on ICLR 2022: Explainable AI: Object Recognition With Help From Background, we are now trying to enlarge the dataset, and analysis the following topics: Blur Background / Segmented Background / AI generated Background/ Bias of tools during annotation/ Color in Background / Dependent Factor in Background/ LatenSpace Distance of Foreground/ Random Background with Real Environment! Ultimately, we also define the math equation of Robustness Scores! So if you feel interested How would we make it or join this research project? please feel free to collaborate with us!

    In this paper, we propose an explainable visual dataset, XIMAGENET-12, to evaluate the robustness of visual models. XIMAGENET-12 consists of over 200K images with 15,410 manual semantic annotations. Specifically, we deliberately selected 12 categories from ImageNet, representing objects commonly encountered in practical life. To simulate real-world situations, we incorporated six diverse scenarios, such as overexposure, blurring, and color changes, etc. We further develop a quantitative criterion for robustness assessment, allowing for a nuanced understanding of how visual models perform under varying conditions, notably in relation to the background.

    Progress:

    • Blur Background-> Done! You can find the image Generated in the corresponding folder!
    • Segmented Background -> Done! you can download the image and its corresponding transparent mask image!
    • Color in Background->Done!~~ you can now download the image with different background color modified, and play with different color-ed images!
    • Random Background with Real Environment -> Done! you can also find we generated the image with the photographer's real image as a background and removed the original background of the target object, but similar to the style!
    • Bias of tools during annotation->Done! for this one, you won't get a new image, because this is about math and statistics data analysis when different tools and annotators are applied!
    • AI generated Background-> current on progress ( 12 /12) Done!, So basically you can find one sample folder image we uploaded, please take a look at how real it is, and guess what LLM model we are using to generate the high-resolution background to make it so real :)

    What tool we used to generate those images?

    We employed a combination of tools and methodologies to generate the images in this dataset, ensuring both efficiency and quality in the annotation and synthesis processes.

    • IoG Net: Initially, we utilized the IoG Net, which played a foundational role in our image generation pipeline.
    • Polygon Faster Labeling Tool: To facilitate the annotation process, we developed a custom Polygon Faster Labeling Tool, streamlining the labeling of objects within the images.AnyLabeling Open-source Project: We also experimented with the AnyLabeling open-source project, exploring its potential for our annotation needs.
    • V7 Lab Tool: Eventually, we found that the V7 Lab Tool provided the most efficient labeling speed and delivered high-quality annotations. As a result, we standardized the annotation process using this tool.
    • Data Augmentation: For the synthesis of synthetic images, we relied on a combination of deep learning frameworks, including scikit-learn and OpenCV. These tools allowed us to augment and manipulate images effectively to create a diverse range of backgrounds and variations.
    • GenAI: Our dataset includes images generated using the Stable Diffusion XL model, along with versions 1.5 and 2.0 of the Stable Diffusion model. These generative models played a pivotal role in crafting realistic and varied backgrounds.

    For a detailed breakdown of our prompt engineering and hyperparameters, we invite you to consult our upcoming paper. This publication will provide comprehensive insights into our methodologies, enabling a deeper understanding of the image generation process.

    How to use our dataset?

    this dataset has been/could be downloaded via Kaggl...

  8. P

    Ornamental Flower Plants Dataset Dataset

    • paperswithcode.com
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Ornamental Flower Plants Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/ornamental-flower-plants-dataset
    Explore at:
    Dataset updated
    Feb 25, 2025
    Description

    Description:

    šŸ‘‰ Download the dataset here

    The Ornamental Flower Plants dataset is curated to assist in the development of accurate image classification models for various species of ornamental plants. Ideal for machine learning practitioners, researchers, and botanists, this dataset is structured to facilitate training and testing models that focus on plant recognition and classification. It contributes to enhancing biodiversity monitoring, plant taxonomy, and conservation through technological advancements.

    Download Dataset

    Dataset Features

    Image Format: The dataset contains high-quality images in JPEG format, all resized to 224Ɨ224 pixels to ensure uniformity in model training.

    Split Structure:

    Training Set: Contains approximately 700 images of ornamental flowers, capturing various angles, lighting conditions, and environments to simulate real-world data variability.

    Test Set: Comprises 150 images used for model evaluation, allowing for accurate performance measurement after training.

    Categorization: Each image in the dataset is categorized by flower species, ensuring diversity in floral morphology, color, size, and regional varieties.

    Context and Importance

    In the modern era, flower identification has numerous applications ranging from educational tools to horticultural management. With increasing environmental concerns, this dataset serves as a valuable resource for automating plant recognition systems. Whether integrated into mobile applications for hobbyists or leveraged in large-scale agricultural systems, this dataset can assist in identifying and cataloging plant species effortlessly.

    Data Enhancement Opportunities

    Extended Variety: Expanding the dataset by adding more species of flowers, including those from rare or endangered categories, would greatly enhance the classification scope.

    Additional Metadata: Adding extra information such as blooming seasons, geographic regions, and common uses (e.g., medicinal, decorative) would increase the dataset’s applicability across different fields like environmental science and agriculture.

    Augmentation Techniques: To further improve model generalization, data augmentation (such as rotating, flipping, and varying brightness) could be applied to create synthetic variations of the original images.

    Inspiration for Model Development

    The inspiration behind this dataset is to push the boundaries of ornamental plant identification by building models that outperform existing flower classification tools. The goal is to develop an intuitive, easy-to-use system capable of classifying multiple flower species with high precision. Such systems can be useful in conservation efforts, eco-tourism, educational tools, or gardening aids.

    Potential Applications

    Mobile Applications: Use the dataset to develop apps that allow users to snap a picture and identify flowers instantly.

    Agricultural Systems: Employ the dataset for AI-driven tools that assist farmers in monitoring ornamental plant health and identifying potential threats.

    Conservation Efforts: Aid botanists and environmentalists in cataloging and preserving endangered flower species through automated systems.

    Conclusion

    The Ornamental Flower Plants dataset is an excellent starting point for developing sophisticated image classification models tailored to plant recognition. Its potential to be expanded and its diverse applications across industries make it an invaluable resource for AI practitioners and researchers working on environmental and botanical projects.

    This dataset is sourced from Kaggle.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Meyer; David Meyer (2022). Data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators" [Dataset]. http://doi.org/10.5281/zenodo.5150327
Organization logo

Data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators"

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Mar 15, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David Meyer; David Meyer
Description

Overview

This is the data archive for paper "Copula-based synthetic data augmentation for machine-learning emulators". It contains the paper’s data archive with model outputs (see results folder) and the Singularity image for (optionally) re-running experiments.

For the Python tool used to generate synthetic data, please refer to Synthia.

Requirements

*Although PBS in not a strict requirement, it is required to run all helper scripts as included in this repository. Please note that depending on your specific system settings and resource availability, you may need to modify PBS parameters at the top of submit scripts stored in the hpc directory (e.g. #PBS -lwalltime=72:00:00).

Usage

To reproduce the results from the experiments described in the paper, first fit all copula models to the reduced NWP-SAF dataset with:

qsub hpc/fit.sh

then, to generate synthetic data, run all machine learning model configurations, and compute the relevant statistics use:

qsub hpc/stats.sh
qsub hpc/ml_control.sh
qsub hpc/ml_synth.sh

Finally, to plot all artifacts included in the paper use:

qsub hpc/plot.sh

Licence

Code released under MIT license. Data from the reduced NWP-SAF dataset released under CC BY 4.0.

Search
Clear search
Close search
Google apps
Main menu