https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides realistic stock market data generated using Geometric Brownian Motion for price movements and Markov Chains for trend prediction. It is designed for time-series forecasting, financial modeling, and algorithmic trading simulations.
Column Name | Description |
---|---|
Date | Trading date |
Company | Stock name (e.g., Apple, Tesla, JPMorgan, etc.) |
Sector | Industry classification |
Open | Opening price of the stock |
High | Highest price of the stock for the day |
Low | Lowest price of the stock for the day |
Close | Closing price of the stock |
Volume | Number of shares traded |
Market_Cap | Market capitalization (in USD) |
PE_Ratio | Price-to-Earnings ratio |
Dividend_Yield | Percentage of dividends relative to stock price |
Volatility | Measure of stock price fluctuation |
Sentiment_Score | Market sentiment (-1 to 1 scale) |
Trend | Stock market trend (Bullish, Bearish, or Stable) |
πΉ Time-Series Forecasting: Train models like LSTMs, Transformers, or ARIMA for stock price prediction.
πΉ Algorithmic Trading: Develop trading strategies based on trends and sentiment.
πΉ Feature Engineering: Explore correlations between financial metrics and stock movements.
πΉ Quantitative Finance Research: Analyze market trends using simulated yet realistic data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accurate failure prediction is critical for the reliability of HPC facilities and data centers storage systems. This study addresses data scarcity, privacy concerns, and class imbalance in HDD failure datasets by leveraging synthetic data generation. We propose an end-to-end framework to generate synthetic storage data using Generative Adversarial Networks and Diffusion models. We implement a data segmentation approach considering temporal variation of disks access to generate high-fidelity synthetic data that replicates the nuanced temporal and feature-specific patterns of disk failures. Experimental results show that synthetic data achieves similarity scores of 0.81β0.89 and enhances failure prediction performance, with up to 3% improvement in accuracy and 2% in ROC-AUC. With only minor performance drops versus real-data training, synthetically trained models prove viable for predictive maintenance.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains data generated in the AI DHC project.
This dataset contains synthetic fault data for decrease of the COP of a heat pump
The IEA DHC Annex XIII project βArtificial Intelligence for Failure Detection and Forecasting of Heat Production and Heat demand in District Heating Networksβ is developing Artificial Intelligence (AI) methods for forecasting heat demand and heat production and is evaluating algorithms for detecting faults which can be used by interested stakeholders (operators, suppliers of DHC components and manufacturers of control devices).
See https://github.com/mathieu-vallee/ai-dhc for the models and pythons scripts used to generate the dataset
Please cite this dataset as: Vallee, M., Wissocq T., Gaoua Y., Lamaison N., Generation and Evaluation of a Synthetic Dataset to improve Fault Detection in District Heating and Cooling Systems, 2023 (under review at the Energy journal)
Disclaimer notice (IEA DHC): This project has been independently funded by the International Energy Agency Technology Collaboration Programme on District Heating and Cooling including Combined Heat and Power (IEA DHC).
Any views expressed in this publication are not necessarily those of IEA DHC.
IEA DHC can take no responsibility for the use of the information within this publication, nor for any errors or omissions it may contain.
Information contained herein have been compiled or arrived from sources believed to be reliable. Nevertheless, the authors or their organizations do not accept liability for any loss or damage arising from the use thereof. Using the given information is strictly your own responsibility.
Disclaimer Notice (Authors):
This publication has been compiled with reasonable skill and care. However, neither the authors nor the DHC Contracting Parties (of the International Energy Agency Technology Collaboration Programme on District Heating & Cooling) make any representation as to the adequacy or accuracy of the information contained herein, or as to its suitability for any particular application, and accept no responsibility or liability arising out of the use of this publication. The information contained herein does not supersede the requirements given in any national codes, regulations or standards, and should not be regarded as a substitute
Copyright:
All property rights, including copyright, are vested in IEA DHC. In particular, all parts of this publication may be reproduced, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise only by crediting IEA DHC as the original source. Republishing of this report in another format or storing the report in a public retrieval system is prohibited unless explicitly permitted by the IEA DHC Operating Agent in writing.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides synthetic data designed to analyze and predict power load (in MW) in Delhi, incorporating a variety of influencing factors such as weather, holidays, festivals, and real estate development levels. With over a year of hourly data, this dataset is ideal for researchers, students, and practitioners working on energy systems, urban planning, and time-series forecasting.
1. Power Load Forecasting:Build machine learning models to predict future electricity demand. 2. Weather Impact Studies: Analyze how weather conditions influence power consumption patterns. 3. Urban Development Insights: Explore the correlation between area development levels and energy usage. 4. Policy Planning: Assist policymakers in understanding energy demand trends during holidays, festivals, and extreme weather. 5. Time Series Analysis: Practice and research advanced time-series forecasting techniques. 6. Renewable Energy Integration: Develop models to optimize energy distribution and reduce reliance on non-renewable sources.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WARNING
This version of the dataset is not recommended for anomaly detection use case. We discovered discrepancies in the anomalous sequences. A new version will be released. In the meantime, please ignore all sequence marked as anomalous.
CONTEXT
Testing hardware to qualify it for Spaceflight is critical to model and verify performances. Hot fire tests (also known as life-tests) are typically run during the qualification campaigns of satellite thrusters, but results remain proprietary data, hence making it difficult for the machine learning community to develop suitable data-driven predictive models. This synthetic dataset was generated partially based on the real-world physics of monopropellant chemical thrusters, to foster the development and benchmarking of new data-driven analytical methods (machine learning, deep-learning, etc.).
The PDF document "STFT Dataset Description" describes in much details the structure, context, use cases and domain-knowledge about thruster in order for ML practitioners to use the dataset.
PROPOSED TASKS
Supervised:
Unsupervised / Anomaly Detection
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 7.98(USD Billion) |
MARKET SIZE 2024 | 9.55(USD Billion) |
MARKET SIZE 2032 | 40.0(USD Billion) |
SEGMENTS COVERED | Type ,Application ,Deployment Mode ,Organization Size ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | Growing Demand for Data Privacy and Security Advancement in Artificial Intelligence AI and Machine Learning ML Increasing Need for Faster and More Efficient Data Generation Growing Adoption of Synthetic Data in Various Industries Government Regulations and Compliance |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | MostlyAI ,Gretel.ai ,H2O.ai ,Scale AI ,UNchart ,Anomali ,Replica ,Big Syntho ,Owkin ,DataGenix ,Synthesized ,Verisart ,Datumize ,Deci ,Datasaur |
MARKET FORECAST PERIOD | 2025 - 2032 |
KEY MARKET OPPORTUNITIES | Data privacy compliance Improved data availability Enhanced data quality Reduced data bias Costeffective |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 19.61% (2025 - 2032) |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This comprehensive financial dataset combines transaction records, customer information, and card data from a banking institution, spanning across the 2010s decade. The dataset is designed for multiple analytical purposes, including synthetic fraud detection, customer behavior analysis, and expense forecasting.
transactions_data.csv
)cards_dat.csv
)mcc_codes.json
)train_fraud_labels.json
)users_data
)Dataset created by Caixabank Tech for the 2024 AI Hackathon
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a detailed time-series estimate of the monthly cost of living across 20 different areas in Nairobi, Kenya from 2019 to 2024. It covers essential expenses such as rent, food, transport, utilities, and miscellaneous costs, allowing for comprehensive cost-of-living analysis.
This dataset is useful for:
β
Individuals planning to move to Nairobi
β
Researchers analyzing long-term cost trends
β
Businesses assessing salary benchmarks based on inflation
β
Data scientists developing predictive models for cost forecasting
Area
: The residential area in Nairobi Rent
: Estimated monthly rent (KES) Food
: Grocery and dining expenses (KES) Transport
: Public and private transport costs (KES) Utilities
: Water, electricity, and internet bills (KES) Misc
: Entertainment, personal care, and leisure expenses (KES) Total
: Sum of all expenses Date
: Monthly timestamp from January 2019 to December 2024 This dataset provides cost estimates for 20+ residential areas, including:
- High-End Areas π‘: Kileleshwa, Westlands, Karen
- Mid-Range Areas ποΈ: South B, Langata, Ruaka
- Affordable Areas π : Embakasi, Kasarani, Githurai, Ruiru, Umoja
- Satellite Towns πΏ: Ngong, Rongai, Thika, Kitengela, Kikuyu
This dataset was synthetically generated using Python, incorporating realistic market variations. The process includes:
β Inflation Modeling π β A 2% annual increase in costs over time.
β Seasonal Effects π
β Higher food and transport costs in December & January (holiday season), rent spikes in June & July.
β Economic Shocks β οΈ β A 5% chance per record of external economic effects (e.g., fuel price hikes, supply chain issues).
β Random Fluctuations π β Expenses vary slightly month-to-month to simulate real-world spending behavior.
nairobi_cost_of_living_time_series.csv
β 60,000 records in CSV format (time-series structured). This dataset was generated for research and educational purposes. If you find it useful, consider citing it in your work. π
This updated version makes your documentation more detailed and actionable for users interested in forecasting and economic analysis. Would you like help building a cost prediction model? π
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides realistic stock market data generated using Geometric Brownian Motion for price movements and Markov Chains for trend prediction. It is designed for time-series forecasting, financial modeling, and algorithmic trading simulations.
Column Name | Description |
---|---|
Date | Trading date |
Company | Stock name (e.g., Apple, Tesla, JPMorgan, etc.) |
Sector | Industry classification |
Open | Opening price of the stock |
High | Highest price of the stock for the day |
Low | Lowest price of the stock for the day |
Close | Closing price of the stock |
Volume | Number of shares traded |
Market_Cap | Market capitalization (in USD) |
PE_Ratio | Price-to-Earnings ratio |
Dividend_Yield | Percentage of dividends relative to stock price |
Volatility | Measure of stock price fluctuation |
Sentiment_Score | Market sentiment (-1 to 1 scale) |
Trend | Stock market trend (Bullish, Bearish, or Stable) |
πΉ Time-Series Forecasting: Train models like LSTMs, Transformers, or ARIMA for stock price prediction.
πΉ Algorithmic Trading: Develop trading strategies based on trends and sentiment.
πΉ Feature Engineering: Explore correlations between financial metrics and stock movements.
πΉ Quantitative Finance Research: Analyze market trends using simulated yet realistic data.