11 datasets found

Stock Market Simulation Dataset

kaggle.com

Updated Mar 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Samay Ashar (2025). Stock Market Simulation Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11010423

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/11010423

Dataset updated

Mar 12, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Samay Ashar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset provides realistic stock market data generated using Geometric Brownian Motion for price movements and Markov Chains for trend prediction. It is designed for time-series forecasting, financial modeling, and algorithmic trading simulations.

Key Features

1000 days of synthetic stock market data (from January 1, 2022, onwards).
Multiple companies from diverse industries (Technology, Finance, Healthcare, Energy, Consumer Goods, Automotive, Aerospace, etc.).
Stock price details: Open, High, Low, Close prices.
Trading volume and market capitalization.
Financial metrics: P/E Ratio, Dividend Yield, Volatility.
Sentiment Score: A measure of market sentiment (-1 to 1 scale).
Trend Labeling: Bullish, Bearish, or Stable, based on Markov Chain modeling.

Column Name	Description
Date	Trading date
Company	Stock name (e.g., Apple, Tesla, JPMorgan, etc.)
Sector	Industry classification
Open	Opening price of the stock
High	Highest price of the stock for the day
Low	Lowest price of the stock for the day
Close	Closing price of the stock
Volume	Number of shares traded
Market_Cap	Market capitalization (in USD)
PE_Ratio	Price-to-Earnings ratio
Dividend_Yield	Percentage of dividends relative to stock price
Volatility	Measure of stock price fluctuation
Sentiment_Score	Market sentiment (-1 to 1 scale)
Trend	Stock market trend (Bullish, Bearish, or Stable)

Usage Scenarios

🔹 Time-Series Forecasting: Train models like LSTMs, Transformers, or ARIMA for stock price prediction.
🔹 Algorithmic Trading: Develop trading strategies based on trends and sentiment.
🔹 Feature Engineering: Explore correlations between financial metrics and stock movements.
🔹 Quantitative Finance Research: Analyze market trends using simulated yet realistic data.

PS: If you find this dataset helpful, please consider upvoting :)

f
S2 Data -
plos.figshare.com
txt
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). S2 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0295803.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.s002
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Synthetic Data Generation for Hard Drive Failure Prediction in Large-scale...
figshare.com
zip
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chandranil Chakraborttii (2025). Synthetic Data Generation for Hard Drive Failure Prediction in Large-scale Systems [Dataset]. http://doi.org/10.6084/m9.figshare.28878830.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28878830.v1
Dataset updated
Apr 27, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Chandranil Chakraborttii
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accurate failure prediction is critical for the reliability of HPC facilities and data centers storage systems. This study addresses data scarcity, privacy concerns, and class imbalance in HDD failure datasets by leveraging synthetic data generation. We propose an end-to-end framework to generate synthetic storage data using Generative Adversarial Networks and Diffusion models. We implement a data segmentation approach considering temporal variation of disks access to generate high-fidelity synthetic data that replicates the nuanced temporal and feature-specific patterns of disk failures. Experimental results show that synthetic data achieves similarity scores of 0.81–0.89 and enhances failure prediction performance, with up to 3% improvement in accuracy and 2% in ROC-AUC. With only minor performance drops versus real-data training, synthetically trained models prove viable for predictive maintenance.
Heat pump COP drop - synthetic faults
kaggle.com
zip
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathieu Vallee (2023). Heat pump COP drop - synthetic faults [Dataset]. https://www.kaggle.com/datasets/mathieuvallee/ai-dhc-heatpump-cop
Explore at:
zip(68378018 bytes)Available download formats
Dataset updated
Feb 28, 2023
Authors
Mathieu Vallee
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains data generated in the AI DHC project.

This dataset contains synthetic fault data for decrease of the COP of a heat pump

The IEA DHC Annex XIII project “Artificial Intelligence for Failure Detection and Forecasting of Heat Production and Heat demand in District Heating Networks” is developing Artificial Intelligence (AI) methods for forecasting heat demand and heat production and is evaluating algorithms for detecting faults which can be used by interested stakeholders (operators, suppliers of DHC components and manufacturers of control devices).

See https://github.com/mathieu-vallee/ai-dhc for the models and pythons scripts used to generate the dataset

Please cite this dataset as: Vallee, M., Wissocq T., Gaoua Y., Lamaison N., Generation and Evaluation of a Synthetic Dataset to improve Fault Detection in District Heating and Cooling Systems, 2023 (under review at the Energy journal)

Disclaimer notice (IEA DHC): This project has been independently funded by the International Energy Agency Technology Collaboration Programme on District Heating and Cooling including Combined Heat and Power (IEA DHC).

Any views expressed in this publication are not necessarily those of IEA DHC.

IEA DHC can take no responsibility for the use of the information within this publication, nor for any errors or omissions it may contain.

Information contained herein have been compiled or arrived from sources believed to be reliable. Nevertheless, the authors or their organizations do not accept liability for any loss or damage arising from the use thereof. Using the given information is strictly your own responsibility.

Disclaimer Notice (Authors):

This publication has been compiled with reasonable skill and care. However, neither the authors nor the DHC Contracting Parties (of the International Energy Agency Technology Collaboration Programme on District Heating & Cooling) make any representation as to the adequacy or accuracy of the information contained herein, or as to its suitability for any particular application, and accept no responsibility or liability arising out of the use of this publication. The information contained herein does not supersede the requirements given in any national codes, regulations or standards, and should not be regarded as a substitute

Copyright:

All property rights, including copyright, are vested in IEA DHC. In particular, all parts of this publication may be reproduced, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise only by crediting IEA DHC as the original source. Republishing of this report in another format or storing the report in a public retrieval system is prohibited unless explicitly permitted by the IEA DHC Operating Agent in writing.
Delhi Power Load with Weather & Development
kaggle.com
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratik Chougule (2025). Delhi Power Load with Weather & Development [Dataset]. https://www.kaggle.com/datasets/pratikyuvrajchougule/delhi-datset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2025
Dataset provided by
Kaggle
Authors
Pratik Chougule
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Delhi
Description
This dataset provides synthetic data designed to analyze and predict power load (in MW) in Delhi, incorporating a variety of influencing factors such as weather, holidays, festivals, and real estate development levels. With over a year of hourly data, this dataset is ideal for researchers, students, and practitioners working on energy systems, urban planning, and time-series forecasting.

Key Features:

Weather Data: Temperature, humidity, wind speed, and rainfall measurements for each hour.

Socio-Economic Indicators: Information on public holidays, weekly holidays, and festival days.

Urban Development: Classification of areas into low, medium, and high development zones with respective percentages.

Power Load (MW): Target variable representing hourly electricity consumption in megawatts. ## Purpose: This dataset is intended for the following use cases:

1. Power Load Forecasting:Build machine learning models to predict future electricity demand. 2. Weather Impact Studies: Analyze how weather conditions influence power consumption patterns. 3. Urban Development Insights: Explore the correlation between area development levels and energy usage. 4. Policy Planning: Assist policymakers in understanding energy demand trends during holidays, festivals, and extreme weather. 5. Time Series Analysis: Practice and research advanced time-series forecasting techniques. 6. Renewable Energy Integration: Develop models to optimize energy distribution and reduce reliance on non-renewable sources.

Potential Applications:

Building intelligent power grid systems.

Analyzing the impact of climate change on energy demand.

Supporting smart city initiatives with energy-efficient planning.

Creating educational tools for data science and machine learning learners.
f
Coefficients of ARIMA(7,0,7).
plos.figshare.com
xls
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). Coefficients of ARIMA(7,0,7). [Dataset]. http://doi.org/10.1371/journal.pone.0295803.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.t010
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Spacecraft Thruster Firing Test Dataset
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Spacecraft Thruster Firing Test Dataset [Dataset]. http://doi.org/10.5281/zenodo.7137930
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7137930
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
WARNING

This version of the dataset is not recommended for anomaly detection use case. We discovered discrepancies in the anomalous sequences. A new version will be released. In the meantime, please ignore all sequence marked as anomalous.

CONTEXT

Testing hardware to qualify it for Spaceflight is critical to model and verify performances. Hot fire tests (also known as life-tests) are typically run during the qualification campaigns of satellite thrusters, but results remain proprietary data, hence making it difficult for the machine learning community to develop suitable data-driven predictive models. This synthetic dataset was generated partially based on the real-world physics of monopropellant chemical thrusters, to foster the development and benchmarking of new data-driven analytical methods (machine learning, deep-learning, etc.).

The PDF document "STFT Dataset Description" describes in much details the structure, context, use cases and domain-knowledge about thruster in order for ML practitioners to use the dataset.

PROPOSED TASKS

Supervised:

Performance Modelling: Prediction of the thruster performances (target can be thrust, mass flow rate, and/or the average specific impulse)

Acceptance Test for Individualised Performance Model refinement: Taking into account the acceptance test of individual thruster might be helpful to generate individualised thruster predictive model

Uncertainty Quantification for Thruster-to-thruster reproducibility verification, i.e. to evaluate the prediction variability between several thrusters in order to construct uncertainty bounds around the prediction (predictive intervals) of the thrust and mass flow rate of future thrusters that may be used during an actual space mission

Unsupervised / Anomaly Detection

Anomaly Detection: Anomalies can be detected in an unsupervised setting (outlier detection) or in a semi-supervised setting (novelty detection). The dataset includes a total of 270 anomalies. A simple approach is to predict if a firing test sequence is anomalous or nominal. A more advanced approach is trying to predict which portion of a time series is anomalous. The dataset also provide a detailed information about each time point being anomalous or nominal. In case of an anomaly, a code is provided which allows to diagnosis the detection system performance on the different types of anomalies contained in the dataset.

Global Synthetic Data Tool Market Research Report: By Type (Image...

wiseguyreports.com

Updated Aug 10, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Synthetic Data Tool Market Research Report: By Type (Image Generation, Text Generation, Audio Generation, Time-Series Generation, User-Generated Data Marketplace), By Application (Computer Vision, Natural Language Processing, Predictive Analytics, Healthcare, Retail), By Deployment Mode (Cloud-Based, On-Premise), By Organization Size (Small and Medium Enterprises (SMEs), Large Enterprises) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/synthetic-data-tool-market

Explore at:

Dataset updated

Aug 10, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 8, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	7.98(USD Billion)
MARKET SIZE 2024	9.55(USD Billion)
MARKET SIZE 2032	40.0(USD Billion)
SEGMENTS COVERED	Type ,Application ,Deployment Mode ,Organization Size ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	Growing Demand for Data Privacy and Security Advancement in Artificial Intelligence AI and Machine Learning ML Increasing Need for Faster and More Efficient Data Generation Growing Adoption of Synthetic Data in Various Industries Government Regulations and Compliance
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	MostlyAI ,Gretel.ai ,H2O.ai ,Scale AI ,UNchart ,Anomali ,Replica ,Big Syntho ,Owkin ,DataGenix ,Synthesized ,Verisart ,Datumize ,Deci ,Datasaur
MARKET FORECAST PERIOD	2025 - 2032
KEY MARKET OPPORTUNITIES	Data privacy compliance Improved data availability Enhanced data quality Reduced data bias Costeffective
COMPOUND ANNUAL GROWTH RATE (CAGR)	19.61% (2025 - 2032)

f
Selection of best model based on criteria.
plos.figshare.com
xls
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). Selection of best model based on criteria. [Dataset]. http://doi.org/10.1371/journal.pone.0295803.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.t009
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
💳 Financial Transactions Dataset: Analytics
kaggle.com
Updated Oct 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ComputingVictor (2024). 💳 Financial Transactions Dataset: Analytics [Dataset]. https://www.kaggle.com/datasets/computingvictor/transactions-fraud-datasets/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ComputingVictor
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview

This comprehensive financial dataset combines transaction records, customer information, and card data from a banking institution, spanning across the 2010s decade. The dataset is designed for multiple analytical purposes, including synthetic fraud detection, customer behavior analysis, and expense forecasting.

Dataset Components

1. Transaction Data (transactions_data.csv)

Detailed transaction records including amounts, timestamps, and merchant details

Covers transactions throughout the 2010s

Features transaction types, amounts, and merchant information

Perfect for analyzing spending patterns and building fraud detection models

2. Card Information (cards_dat.csv)

Credit and debit card details

Includes card limits, types, and activation dates

Links to customer accounts via card_id

Essential for understanding customer financial profiles

3. Merchant Category Codes (mcc_codes.json)

Standard classification codes for business types

Enables transaction categorization and spending analysis

Industry-standard MCC codes with descriptions

4. Fraud Labels (train_fraud_labels.json)

Binary classification labels for transactions

Indicates fraudulent vs. legitimate transactions

Ideal for training supervised fraud detection models

5. User Data (users_data)

Demographic information about customers

Account-related details

Enables customer segmentation and personalized analysis

Use Cases and Applications

1. Fraud Detection and Security

Build real-time fraud detection systems

Develop anomaly detection algorithms

Create risk scoring models

Implement transaction monitoring systems

Design security alert systems

2. Customer Analytics

Analyze customer lifetime value

Create customer segmentation models

Develop churn prediction systems

Build recommendation engines

Study customer acquisition patterns

3. Financial Planning and Forecasting

Develop expense forecasting models

Create budget planning tools

Build cash flow prediction systems

Design financial health indicators

Implement savings recommendation systems

4. Business Intelligence

Analyze merchant performance

Study market trends

Create sales forecasting models

Develop competitive analysis tools

Build market segmentation models

5. Machine Learning Projects

Practice supervised learning with fraud detection

Implement time series forecasting

Develop clustering algorithms for customer segmentation

Create deep learning models for pattern recognition

Build reinforcement learning systems for automated decision making

Technical Details

Format: CSV, JSON

Time Period: 2010s decade

Citation

Dataset created by Caixabank Tech for the 2024 AI Hackathon
Cost of Living in Nairobi
kaggle.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yacooti (2025). Cost of Living in Nairobi [Dataset]. https://www.kaggle.com/datasets/yacooti/cost-of-living-in-nairobi/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yacooti
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Nairobi
Description
🏡 Cost of Living in Nairobi, Kenya

📌 Overview

This dataset provides a detailed time-series estimate of the monthly cost of living across 20 different areas in Nairobi, Kenya from 2019 to 2024. It covers essential expenses such as rent, food, transport, utilities, and miscellaneous costs, allowing for comprehensive cost-of-living analysis.

This dataset is useful for:
✅ Individuals planning to move to Nairobi
✅ Researchers analyzing long-term cost trends
✅ Businesses assessing salary benchmarks based on inflation
✅ Data scientists developing predictive models for cost forecasting

📊 Data Summary

Total Records: 60,000 (5 years of monthly data)

Columns:

🏠 Area: The residential area in Nairobi

💰 Rent: Estimated monthly rent (KES)

🍽️ Food: Grocery and dining expenses (KES)

🚕 Transport: Public and private transport costs (KES)

⚡ Utilities: Water, electricity, and internet bills (KES)

🎭 Misc: Entertainment, personal care, and leisure expenses (KES)

🏷️ Total: Sum of all expenses

📆 Date: Monthly timestamp from January 2019 to December 2024

📍 Areas Covered

This dataset provides cost estimates for 20+ residential areas, including:
- High-End Areas 🏡: Kileleshwa, Westlands, Karen
- Mid-Range Areas 🏙️: South B, Langata, Ruaka
- Affordable Areas 🏠: Embakasi, Kasarani, Githurai, Ruiru, Umoja
- Satellite Towns 🌿: Ngong, Rongai, Thika, Kitengela, Kikuyu

🛠️ How the Data Was Generated

This dataset was synthetically generated using Python, incorporating realistic market variations. The process includes:

✔ Inflation Modeling 📈 – A 2% annual increase in costs over time.
✔ Seasonal Effects 📅 – Higher food and transport costs in December & January (holiday season), rent spikes in June & July.
✔ Economic Shocks ⚠️ – A 5% chance per record of external economic effects (e.g., fuel price hikes, supply chain issues).
✔ Random Fluctuations 🔄 – Expenses vary slightly month-to-month to simulate real-world spending behavior.

🔍 Potential Use Cases

📊 Cost of Living Analysis – Compare affordability across different Nairobi areas.

💵 Salary & Real Estate Benchmarking – Businesses can analyze salary expectations by location.

📉 Time-Series Forecasting – Train predictive models (ARIMA, Prophet, LSTM) to estimate future living costs.

📈 Inflation Impact Studies – Measure how economic conditions influence cost variations over time.

⚠️ Limitations

Synthetic Data – The dataset is not based on real survey data but follows market trends.

No Lifestyle Adjustments – Differences in household size or spending habits are not factored in.

Inflation Approximation – While inflation is simulated at 2% annually, actual inflation rates may differ.

📁 File Format & Access

nairobi_cost_of_living_time_series.csv – 60,000 records in CSV format (time-series structured).

📢 Acknowledgments

This dataset was generated for research and educational purposes. If you find it useful, consider citing it in your work. 🚀

📥 Download and Explore the Data Now!

This updated version makes your documentation more detailed and actionable for users interested in forecasting and economic analysis. Would you like help building a cost prediction model? 🚀
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Samay Ashar (2025). Stock Market Simulation Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11010423

Stock Market Simulation Dataset

📈 A Realistic Synthetic Dataset for Time-Series Forecasting & Stock Analysis

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/11010423

Dataset updated

Mar 12, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Samay Ashar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Key Features

1000 days of synthetic stock market data (from January 1, 2022, onwards).
Multiple companies from diverse industries (Technology, Finance, Healthcare, Energy, Consumer Goods, Automotive, Aerospace, etc.).
Stock price details: Open, High, Low, Close prices.
Trading volume and market capitalization.
Financial metrics: P/E Ratio, Dividend Yield, Volatility.
Sentiment Score: A measure of market sentiment (-1 to 1 scale).
Trend Labeling: Bullish, Bearish, or Stable, based on Markov Chain modeling.

Column Name	Description
Date	Trading date
Company	Stock name (e.g., Apple, Tesla, JPMorgan, etc.)
Sector	Industry classification
Open	Opening price of the stock
High	Highest price of the stock for the day
Low	Lowest price of the stock for the day
Close	Closing price of the stock
Volume	Number of shares traded
Market_Cap	Market capitalization (in USD)
PE_Ratio	Price-to-Earnings ratio
Dividend_Yield	Percentage of dividends relative to stock price
Volatility	Measure of stock price fluctuation
Sentiment_Score	Market sentiment (-1 to 1 scale)
Trend	Stock market trend (Bullish, Bearish, or Stable)

Usage Scenarios

PS: If you find this dataset helpful, please consider upvoting :)

Clear search

Close search

Google apps

Main menu

Stock Market Simulation Dataset

Key Features

Usage Scenarios

PS: If you find this dataset helpful, please consider upvoting :)

S2 Data -

Synthetic Data Generation for Hard Drive Failure Prediction in Large-scale...

Heat pump COP drop - synthetic faults

Delhi Power Load with Weather & Development

Key Features:

Potential Applications:

Coefficients of ARIMA(7,0,7).

Spacecraft Thruster Firing Test Dataset

Global Synthetic Data Tool Market Research Report: By Type (Image...

Selection of best model based on criteria.

💳 Financial Transactions Dataset: Analytics

Overview

Dataset Components

1. Transaction Data (transactions_data.csv)

2. Card Information (cards_dat.csv)

3. Merchant Category Codes (mcc_codes.json)

4. Fraud Labels (train_fraud_labels.json)

5. User Data (users_data)

Use Cases and Applications

1. Fraud Detection and Security

2. Customer Analytics

3. Financial Planning and Forecasting

4. Business Intelligence

5. Machine Learning Projects

Technical Details

Citation

Cost of Living in Nairobi

🏡 Cost of Living in Nairobi, Kenya

📌 Overview

📊 Data Summary

📍 Areas Covered

🛠️ How the Data Was Generated

🔍 Potential Use Cases

⚠️ Limitations

📁 File Format & Access

📢 Acknowledgments

📥 Download and Explore the Data Now!

Stock Market Simulation Dataset

📈 A Realistic Synthetic Dataset for Time-Series Forecasting & Stock Analysis

Key Features

Usage Scenarios

PS: If you find this dataset helpful, please consider upvoting :)

1. Transaction Data (`transactions_data.csv`)

2. Card Information (`cards_dat.csv`)

3. Merchant Category Codes (`mcc_codes.json`)

4. Fraud Labels (`train_fraud_labels.json`)

5. User Data (`users_data`)