15 datasets found

Stock Market Simulation Dataset

kaggle.com

Updated Mar 12, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Samay Ashar (2025). Stock Market Simulation Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11010423

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/11010423

Dataset updated

Mar 12, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Samay Ashar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset provides realistic stock market data generated using Geometric Brownian Motion for price movements and Markov Chains for trend prediction. It is designed for time-series forecasting, financial modeling, and algorithmic trading simulations.

Key Features

1000 days of synthetic stock market data (from January 1, 2022, onwards).
Multiple companies from diverse industries (Technology, Finance, Healthcare, Energy, Consumer Goods, Automotive, Aerospace, etc.).
Stock price details: Open, High, Low, Close prices.
Trading volume and market capitalization.
Financial metrics: P/E Ratio, Dividend Yield, Volatility.
Sentiment Score: A measure of market sentiment (-1 to 1 scale).
Trend Labeling: Bullish, Bearish, or Stable, based on Markov Chain modeling.

Column Name	Description
Date	Trading date
Company	Stock name (e.g., Apple, Tesla, JPMorgan, etc.)
Sector	Industry classification
Open	Opening price of the stock
High	Highest price of the stock for the day
Low	Lowest price of the stock for the day
Close	Closing price of the stock
Volume	Number of shares traded
Market_Cap	Market capitalization (in USD)
PE_Ratio	Price-to-Earnings ratio
Dividend_Yield	Percentage of dividends relative to stock price
Volatility	Measure of stock price fluctuation
Sentiment_Score	Market sentiment (-1 to 1 scale)
Trend	Stock market trend (Bullish, Bearish, or Stable)

Usage Scenarios

🔹 Time-Series Forecasting: Train models like LSTMs, Transformers, or ARIMA for stock price prediction.
🔹 Algorithmic Trading: Develop trading strategies based on trends and sentiment.
🔹 Feature Engineering: Explore correlations between financial metrics and stock movements.
🔹 Quantitative Finance Research: Analyze market trends using simulated yet realistic data.

PS: If you find this dataset helpful, please consider upvoting :)

f
S2 Data -
plos.figshare.com
txt
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). S2 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0295803.s002
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.s002
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
f
Coefficients of ARIMA(7,0,7).
plos.figshare.com
xls
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). Coefficients of ARIMA(7,0,7). [Dataset]. http://doi.org/10.1371/journal.pone.0295803.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.t010
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Delhi Power Load with Weather & Development
kaggle.com
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratik Chougule (2025). Delhi Power Load with Weather & Development [Dataset]. https://www.kaggle.com/datasets/pratikyuvrajchougule/delhi-datset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2025
Dataset provided by
Kaggle
Authors
Pratik Chougule
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Delhi
Description
This dataset provides synthetic data designed to analyze and predict power load (in MW) in Delhi, incorporating a variety of influencing factors such as weather, holidays, festivals, and real estate development levels. With over a year of hourly data, this dataset is ideal for researchers, students, and practitioners working on energy systems, urban planning, and time-series forecasting.

Key Features:

Weather Data: Temperature, humidity, wind speed, and rainfall measurements for each hour.

Socio-Economic Indicators: Information on public holidays, weekly holidays, and festival days.

Urban Development: Classification of areas into low, medium, and high development zones with respective percentages.

Power Load (MW): Target variable representing hourly electricity consumption in megawatts. ## Purpose: This dataset is intended for the following use cases:

1. Power Load Forecasting:Build machine learning models to predict future electricity demand. 2. Weather Impact Studies: Analyze how weather conditions influence power consumption patterns. 3. Urban Development Insights: Explore the correlation between area development levels and energy usage. 4. Policy Planning: Assist policymakers in understanding energy demand trends during holidays, festivals, and extreme weather. 5. Time Series Analysis: Practice and research advanced time-series forecasting techniques. 6. Renewable Energy Integration: Develop models to optimize energy distribution and reduce reliance on non-renewable sources.

Potential Applications:

Building intelligent power grid systems.

Analyzing the impact of climate change on energy demand.

Supporting smart city initiatives with energy-efficient planning.

Creating educational tools for data science and machine learning learners.
f
Selection of best model based on criteria.
plos.figshare.com
xls
Updated Dec 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahadee Al Mobin; Md. Kamrujjaman (2023). Selection of best model based on criteria. [Dataset]. http://doi.org/10.1371/journal.pone.0295803.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0295803.t009
Dataset updated
Dec 14, 2023
Dataset provided by
PLOS ONE
Authors
Mahadee Al Mobin; Md. Kamrujjaman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data scarcity and discontinuity are common occurrences in the healthcare and epidemiological dataset and often is needed to form an educative decision and forecast the upcoming scenario. Often to avoid these problems, these data are processed as monthly/yearly aggregate where the prevalent forecasting tools like Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving Average (SARIMA), and TBATS often fail to provide satisfactory results. Artificial data synthesis methods have been proven to be a powerful tool for tackling these challenges. The paper aims to propose a novel algorithm named Stochastic Bayesian Downscaling (SBD) algorithm based on the Bayesian approach that can regenerate downscaled time series of varying time lengths from aggregated data, preserving most of the statistical characteristics and the aggregated sum of the original data. The paper presents two epidemiological time series case studies of Bangladesh (Dengue, Covid-19) to showcase the workflow of the algorithm. The case studies illustrate that the synthesized data agrees with the original data regarding its statistical properties, trend, seasonality, and residuals. In the case of forecasting performance, using the last 12 years data of Dengue infection data in Bangladesh, we were able to decrease error terms up to 72.76% using synthetic data over actual aggregated data.
Aerospace Artificial Intelligence (AI) Market Analysis North America,...
technavio.com
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Aerospace Artificial Intelligence (AI) Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, Canada, UK, China, Germany, France, Italy, India, Japan, South Korea - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/aerospace-artificial-intelligence-market-industry-analysis
Explore at:
Dataset updated
Feb 28, 2025
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

Aerospace Artificial Intelligence Market Size 2025-2029

The aerospace artificial intelligence (AI) market size is forecast to increase by USD 7.24 billion at a CAGR of 45.9% between 2024 and 2029.

Artificial Intelligence (AI) is revolutionizing the aerospace industry with its application in various domains, including software for flight simulation and virtual assistants for cockpit interaction. The rising trend of digital transformation in aviation is driving market growth, as AI enables automation in aircraft maintenance, threat detection systems, and additive manufacturing. The increasing use of drones equipped with sensors and data analytics capabilities is another significant trend, offering opportunities for real-time data collection and analysis. However, concerns surrounding data security and privacy are major challenges, necessitating strong cybersecurity measures. Machine learning algorithms, image recognition, and natural language processing are key technologies enabling AI in the aerospace sector, enhancing travel experiences and optimizing operational efficiency. The adoption of AI is set to continue, with the market expected to grow significantly in the coming years.

What will be the Size of the Aerospace Artificial Intelligence (AI) Market During the Forecast Period?

Request Free Sample

The market encompasses the application of AI models, including machine learning, computer vision, and natural language processing, to enhance various aspects of the aerospace sector. AI technologies are increasingly being integrated into flight operations for predictive maintenance, optimization of fuel consumption, and improving pilot training through computer vision and voice recognition. In customer service, virtual assistants and voice recognition systems facilitate efficient communication between airlines and passengers. Air traffic control benefits from AI's ability to analyze big data and identify data patterns for improved safety and efficiency. AI is also employed for observation tasks, such as analyzing time series data for anomaly detection and predictive maintenance in aircraft components. The aerospace AI market is poised for significant growth, as human intelligence is augmented by AI software to address complex challenges and optimize processes.

How is this Aerospace Artificial Intelligence (AI) Industry segmented and which is the largest segment?

The aerospace artificial intelligence (AI) industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Component Software Hardware Services End-user Defense and military Commercial aviation Aircraft manufacturers Space exploration Airports Application Machine learning Natural language processing Computer vision Context awareness computing Geography North America Canada US Europe Germany UK France Italy APAC China India Japan South Korea Middle East and Africa South America

By Component Insights

The software segment is estimated to witness significant growth during the forecast period.

Aerospace Artificial Intelligence (AI) software plays a crucial role in the development and operation of autonomous systems for UAVs, drones, and spacecraft. AI algorithms, including machine learning, computer vision, and neural networks, enable navigation, obstacle detection, and real-time decision-making. For instance, Airbus SE's Air Superiority Tactical Assistance Real-Time Execution System (ASTares) digitizes human-level experience to support tactical coordination in the Future Combat Air System (FCAS). In the aerospace sector, AI software optimizes flight control systems by analyzing data from sensors and adjusting flight parameters in real-time. This leads to improved fuel efficiency, reduced emissions, and enhanced safety. AI models are also integrated into customer service applications, such as virtual assistants and chatbots, to streamline airline industry processes and improve customer satisfaction.

Get a glance at the market report of share of various segments Request Free Sample

The software segment was valued at USD 141.10 million in 2019 and showed a gradual increase during the forecast period.

Regional Analysis

North America is estimated to contribute 35% to the growth of the global market during the forecast period.

Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

For more insights on the market size of various regions, Request Free Sample

The aerospace industry is embracing Artificial Intelligence (AI) to enhance operational efficiency and automate processes in North America. Machine le

Global Synthetic Data Tool Market Research Report: By Type (Image...

wiseguyreports.com

Updated Aug 10, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

wWiseguy Research Consultants Pvt Ltd (2024). Global Synthetic Data Tool Market Research Report: By Type (Image Generation, Text Generation, Audio Generation, Time-Series Generation, User-Generated Data Marketplace), By Application (Computer Vision, Natural Language Processing, Predictive Analytics, Healthcare, Retail), By Deployment Mode (Cloud-Based, On-Premise), By Organization Size (Small and Medium Enterprises (SMEs), Large Enterprises) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/cn/reports/synthetic-data-tool-market

Explore at:

Dataset updated

Aug 10, 2024

Dataset authored and provided by

wWiseguy Research Consultants Pvt Ltd

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Jan 8, 2024

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2024
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2023	7.98(USD Billion)
MARKET SIZE 2024	9.55(USD Billion)
MARKET SIZE 2032	40.0(USD Billion)
SEGMENTS COVERED	Type ,Application ,Deployment Mode ,Organization Size ,Regional
COUNTRIES COVERED	North America, Europe, APAC, South America, MEA
KEY MARKET DYNAMICS	Growing Demand for Data Privacy and Security Advancement in Artificial Intelligence AI and Machine Learning ML Increasing Need for Faster and More Efficient Data Generation Growing Adoption of Synthetic Data in Various Industries Government Regulations and Compliance
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	MostlyAI ,Gretel.ai ,H2O.ai ,Scale AI ,UNchart ,Anomali ,Replica ,Big Syntho ,Owkin ,DataGenix ,Synthesized ,Verisart ,Datumize ,Deci ,Datasaur
MARKET FORECAST PERIOD	2025 - 2032
KEY MARKET OPPORTUNITIES	Data privacy compliance Improved data availability Enhanced data quality Reduced data bias Costeffective
COMPOUND ANNUAL GROWTH RATE (CAGR)	19.61% (2025 - 2032)

Heat pump COP drop - synthetic faults
kaggle.com
zip
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathieu Vallee (2023). Heat pump COP drop - synthetic faults [Dataset]. https://www.kaggle.com/datasets/mathieuvallee/ai-dhc-heatpump-cop
Explore at:
zip(68378018 bytes)Available download formats
Dataset updated
Feb 28, 2023
Authors
Mathieu Vallee
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset contains data generated in the AI DHC project.

This dataset contains synthetic fault data for decrease of the COP of a heat pump

The IEA DHC Annex XIII project “Artificial Intelligence for Failure Detection and Forecasting of Heat Production and Heat demand in District Heating Networks” is developing Artificial Intelligence (AI) methods for forecasting heat demand and heat production and is evaluating algorithms for detecting faults which can be used by interested stakeholders (operators, suppliers of DHC components and manufacturers of control devices).

See https://github.com/mathieu-vallee/ai-dhc for the models and pythons scripts used to generate the dataset

Please cite this dataset as: Vallee, M., Wissocq T., Gaoua Y., Lamaison N., Generation and Evaluation of a Synthetic Dataset to improve Fault Detection in District Heating and Cooling Systems, 2023 (under review at the Energy journal)

Disclaimer notice (IEA DHC): This project has been independently funded by the International Energy Agency Technology Collaboration Programme on District Heating and Cooling including Combined Heat and Power (IEA DHC).

Any views expressed in this publication are not necessarily those of IEA DHC.

IEA DHC can take no responsibility for the use of the information within this publication, nor for any errors or omissions it may contain.

Information contained herein have been compiled or arrived from sources believed to be reliable. Nevertheless, the authors or their organizations do not accept liability for any loss or damage arising from the use thereof. Using the given information is strictly your own responsibility.

Disclaimer Notice (Authors):

This publication has been compiled with reasonable skill and care. However, neither the authors nor the DHC Contracting Parties (of the International Energy Agency Technology Collaboration Programme on District Heating & Cooling) make any representation as to the adequacy or accuracy of the information contained herein, or as to its suitability for any particular application, and accept no responsibility or liability arising out of the use of this publication. The information contained herein does not supersede the requirements given in any national codes, regulations or standards, and should not be regarded as a substitute

Copyright:

All property rights, including copyright, are vested in IEA DHC. In particular, all parts of this publication may be reproduced, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise only by crediting IEA DHC as the original source. Republishing of this report in another format or storing the report in a public retrieval system is prohibited unless explicitly permitted by the IEA DHC Operating Agent in writing.
Artificial Intelligence-As-A-Service (AIaaS) Market Analysis, Size, and...
technavio.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio, Artificial Intelligence-As-A-Service (AIaaS) Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), APAC (China, India, Japan, South Korea), Europe (France, Germany, Italy, UK), Middle East and Africa , and South America [Dataset]. https://www.technavio.com/report/artificial-intelligence-as-a-service-market-industry-analysis
Explore at:
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

Artificial Intelligence-As-A-Service (AIaaS) Market Size 2025-2029

The artificial intelligence-as-a-service (aiaas) market size is forecast to increase by USD 60.24 billion at a CAGR of 42.6% between 2024 and 2029.

The market is experiencing significant growth, driven by increasing investment in research and development and the integration of AIaaS with emerging technologies like Blockchain. These advancements enable organizations to harness the power of AI to streamline operations, enhance customer experiences, and gain competitive advantages. However, the market faces challenges, including data privacy concerns, as businesses grapple with securing sensitive information in a cloud-based environment. As AIaaS continues to evolve, it's crucial for businesses to stay informed about these trends and address the associated challenges to fully leverage the potential of AI technology.

What will be the Size of the Artificial Intelligence-As-A-Service (AIaaS) Market During the Forecast Period?

Request Free SampleIn the dynamic and evolving the market, various advanced technologies are shaping the future of business intelligence. NoSQL databases are increasingly being adopted for their flexibility in handling large, complex datasets. Human-computer interaction is advancing with the integration of Virtual Reality (VR) and Mixed Reality (MR), enhancing user experiences. Reinforcement learning, deep learning, and transfer learning are revolutionizing decision-making processes, providing insights from vast datasets. Time series analysis and unsupervised learning are essential for predictive analytics and pattern recognition. Data warehousing and serverless computing optimize storage and processing capabilities, while cognitive computing and machine translation streamline business operations through automation and multilingual understanding. Sentiment analysis and text summarization are transforming customer engagement and market research, enabling businesses to gain valuable insights from unstructured data. Neural networks and quantum computing are pushing the boundaries of AI, offering unprecedented processing power and efficiency. The integration of AI technologies like semi-supervised learning, reinforcement learning, and deep learning in various applications, including VR, MR, and AR, is redefining industries and creating new opportunities for businesses. In the realm of big data, edge computing and serverless computing are becoming essential components, enabling real-time processing and analysis, while AI continues to drive innovation and growth.

How is this Artificial Intelligence-As-A-Service (AIaaS) Industry segmented and which is the largest segment?

The artificial intelligence-as-a-service (aiaas) industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. End-userRetail and healthcareBFSITelecommunicationGovernment and defenseOthersTypeSoftwareServicesDeploymentPublic cloudPrivate cloudHybrid cloudSourceLarge enterprisesSMEsTechnologyMachine learningNatural language processingComputer visionOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyItalyUKAPACChinaIndiaJapanSouth KoreaRest of World (ROW)

By End-user Insights

The retail and healthcare segment is estimated to witness significant growth during the forecast period.The market is experiencing significant growth as businesses seek to enhance their enterprise resource planning software with AI capabilities. Retail organizations, in particular, are modernizing their IT infrastructure to accommodate new technologies and meet evolving customer expectations. With the increasing competition in retail industries driven by the demand for convenient web and mobile shopping platforms, traditional businesses are expanding into e-commerce. Local retailers are also investing in IT solutions, including AIaaS, to remain competitive and generate additional revenue through online channels. AIaaS is being integrated into various applications, such as marketing automation, cost optimization, predictive analytics, security audits, virtual assistants, recommendation engines, performance optimization, and user interface/experience enhancement. Industry-specific solutions, mobile applications, agile development, and API integration are also gaining popularity. Businesses are leveraging AIaaS for data mining, technical support, natural language processing, business intelligence, content personalization, machine learning models, data visualization, customer service, and more. Additionally, AIaaS is being used for data analysis, fraud detection, computer vision, process automation, data security, training, and documentation, and software-as-a-service (SaaS) offerings. Cloud computing and open-source technologies are enabling the ado
f
Comparative analysis of existing literature.
plos.figshare.com
xls
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghulam Mustafa; Muhammad Ali Moazzam; Asif Nawaz; Tariq Ali; Deema Mohammed Alsekait; Ahmed Saleh Alattas; Diaa Salama AbdElminaam (2025). Comparative analysis of existing literature. [Dataset]. http://doi.org/10.1371/journal.pone.0316682.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0316682.t001
Dataset updated
Feb 5, 2025
Dataset provided by
PLOS ONE
Authors
Ghulam Mustafa; Muhammad Ali Moazzam; Asif Nawaz; Tariq Ali; Deema Mohammed Alsekait; Ahmed Saleh Alattas; Diaa Salama AbdElminaam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accurate crop yield forecasting is vital for ensuring food security and making informed decisions. With the increasing population and global warming, addressing food security has become a priority, so accurate yield forecasting is very important. Artificial Intelligence (AI) has increased the yield accuracy significantly. The existing Machine Learning (ML) methods are using statistical measures as regression, correlation and chi square test for predicting crop yield, all such model’s leads to low accuracy when the number of factors (variables) such as the weather and soil conditions, the wind, fertilizer quantity, and the seed quality and climate are increased. The proposed methodology consists of different stages, like Data Collection, Preprocessing, Feature Extraction with Support Vector Machine (SVM), correlation with Normalized Google Distance (NGD), feature ranking with rising star. This study combines Bidirectional Gated Recurrent Unit (Bi-GRU) and Time Series CNN to predict crop yield and then recommendation for further improvement. The proposed model showed very good results in all datasets and showed significant improvement compared to baseline models. The ECP-IEM achieved an accuracy 96.34%, precision 94.56% and recall 95.23% on different datasets. Moreover, the proposed model was also evaluated based on MAE, MSE, and RMSE, which produced values of 0.191, 0.0674, and 0.238, respectively. This will help in improving production of crops by giving an early look about the yield of crops which will than help the farmer in improving the crops yield.
n
Data from: From Chaos to Harmony: Addressing Data De-Noising, Complexity and...
curate.nd.edu
pdf
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qianlong Wen (2025). From Chaos to Harmony: Addressing Data De-Noising, Complexity and Adaptability in Graph Machine Learning [Dataset]. http://doi.org/10.7274/28786127.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.7274/28786127.v1
Dataset updated
Apr 28, 2025
Dataset provided by
University of Notre Dame
Authors
Qianlong Wen
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
Graph representation learning—especially via graph neural networks (GNNs)—has demonstrated considerable promise in modeling intricate interaction systems, such as social networks and molecular structures. However, the deployment of GNN-based frameworks in industrial settings remains challenging due to the inherent complexity and noise in real-world graph data. This dissertation systematically addresses these challenges by advancing novel methodologies to improve the comprehensiveness and robustness of graph representation learning, with a dual focus on resolving data complexity and denoising across diverse graph-learning scenarios. In addressing graph data denoising, we design auxiliary self-supervised optimization objectives that disentangle noisy topological structures and misinformation while preserving the representational sufficiency of critical graph features. These tasks operate synergistically with primary learning objectives to enhance robustness against data corruption. The efficacy of these techniques is demonstrated through their application to real-world opioid prescription time series data for predicting potential opioid over-prescription. To mitigate data complexity, the study investigates two complementary approaches: (1) multimodal fusion, which employs attentive integration of graph data with features from other modalities, and (2) hierarchical substructure mining, which extracts semantic patterns at multiple granularities to enhance model generalization in demanding contexts. Finally, the dissertation explores the adaptability of graph data in a range of practical applications, including E-commerce demand forecasting and recommendations, to further enhance prediction and reasoning capabilities.
i
KPI prediction dataset
ieee-dataport.org
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hu Zhang (2024). KPI prediction dataset [Dataset]. https://ieee-dataport.org/documents/kpi-prediction-dataset
Explore at:
Dataset updated
Jun 20, 2024
Authors
Hu Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
KPI prediction
Spacecraft Thruster Firing Test Dataset
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Spacecraft Thruster Firing Test Dataset [Dataset]. http://doi.org/10.5281/zenodo.7137930
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7137930
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
WARNING

This version of the dataset is not recommended for anomaly detection use case. We discovered discrepancies in the anomalous sequences. A new version will be released. In the meantime, please ignore all sequence marked as anomalous.

CONTEXT

Testing hardware to qualify it for Spaceflight is critical to model and verify performances. Hot fire tests (also known as life-tests) are typically run during the qualification campaigns of satellite thrusters, but results remain proprietary data, hence making it difficult for the machine learning community to develop suitable data-driven predictive models. This synthetic dataset was generated partially based on the real-world physics of monopropellant chemical thrusters, to foster the development and benchmarking of new data-driven analytical methods (machine learning, deep-learning, etc.).

The PDF document "STFT Dataset Description" describes in much details the structure, context, use cases and domain-knowledge about thruster in order for ML practitioners to use the dataset.

PROPOSED TASKS

Supervised:

Performance Modelling: Prediction of the thruster performances (target can be thrust, mass flow rate, and/or the average specific impulse)

Acceptance Test for Individualised Performance Model refinement: Taking into account the acceptance test of individual thruster might be helpful to generate individualised thruster predictive model

Uncertainty Quantification for Thruster-to-thruster reproducibility verification, i.e. to evaluate the prediction variability between several thrusters in order to construct uncertainty bounds around the prediction (predictive intervals) of the thrust and mass flow rate of future thrusters that may be used during an actual space mission

Unsupervised / Anomaly Detection

Anomaly Detection: Anomalies can be detected in an unsupervised setting (outlier detection) or in a semi-supervised setting (novelty detection). The dataset includes a total of 270 anomalies. A simple approach is to predict if a firing test sequence is anomalous or nominal. A more advanced approach is trying to predict which portion of a time series is anomalous. The dataset also provide a detailed information about each time point being anomalous or nominal. In case of an anomaly, a code is provided which allows to diagnosis the detection system performance on the different types of anomalies contained in the dataset.
m
Giant Mud Crab Molting Visual Dataset
data.mendeley.com
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dany Eka Saputra (2024). Giant Mud Crab Molting Visual Dataset [Dataset]. http://doi.org/10.17632/4kc36yjhdy.1
Explore at:
Unique identifier
https://doi.org/10.17632/4kc36yjhdy.1
Dataset updated
Dec 16, 2024
Authors
Dany Eka Saputra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains images of Giant Mud Crab growth before molting. This data is time series data that shows the growth of several crabs before it molts. The data is collected as a basis to develop an AI model that can predict the time to molt of a crab, especially Giant Mud Crab species (Scylla Serrata). The hypothesis for this data collection is that the time of a crab molting can be predicted by observing the visual cue (e.g. growth of limbs) that exist on the crab. The dataset contains image of 6 different crab that taken at the same time periods. The crab have different molting time, so the dataset has include the time-to-molt data for each image, that show how long the crab in the picture will molt. The dataset is gathered on November 2024 at a vertical crab farm in Surabaya, Indonesia.
Cost of Living in Nairobi
kaggle.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yacooti (2025). Cost of Living in Nairobi [Dataset]. https://www.kaggle.com/datasets/yacooti/cost-of-living-in-nairobi/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yacooti
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Nairobi
Description
🏡 Cost of Living in Nairobi, Kenya

📌 Overview

This dataset provides a detailed time-series estimate of the monthly cost of living across 20 different areas in Nairobi, Kenya from 2019 to 2024. It covers essential expenses such as rent, food, transport, utilities, and miscellaneous costs, allowing for comprehensive cost-of-living analysis.

This dataset is useful for:
✅ Individuals planning to move to Nairobi
✅ Researchers analyzing long-term cost trends
✅ Businesses assessing salary benchmarks based on inflation
✅ Data scientists developing predictive models for cost forecasting

📊 Data Summary

Total Records: 60,000 (5 years of monthly data)

Columns:

🏠 Area: The residential area in Nairobi

💰 Rent: Estimated monthly rent (KES)

🍽️ Food: Grocery and dining expenses (KES)

🚕 Transport: Public and private transport costs (KES)

⚡ Utilities: Water, electricity, and internet bills (KES)

🎭 Misc: Entertainment, personal care, and leisure expenses (KES)

🏷️ Total: Sum of all expenses

📆 Date: Monthly timestamp from January 2019 to December 2024

📍 Areas Covered

This dataset provides cost estimates for 20+ residential areas, including:
- High-End Areas 🏡: Kileleshwa, Westlands, Karen
- Mid-Range Areas 🏙️: South B, Langata, Ruaka
- Affordable Areas 🏠: Embakasi, Kasarani, Githurai, Ruiru, Umoja
- Satellite Towns 🌿: Ngong, Rongai, Thika, Kitengela, Kikuyu

🛠️ How the Data Was Generated

This dataset was synthetically generated using Python, incorporating realistic market variations. The process includes:

✔ Inflation Modeling 📈 – A 2% annual increase in costs over time.
✔ Seasonal Effects 📅 – Higher food and transport costs in December & January (holiday season), rent spikes in June & July.
✔ Economic Shocks ⚠️ – A 5% chance per record of external economic effects (e.g., fuel price hikes, supply chain issues).
✔ Random Fluctuations 🔄 – Expenses vary slightly month-to-month to simulate real-world spending behavior.

🔍 Potential Use Cases

📊 Cost of Living Analysis – Compare affordability across different Nairobi areas.

💵 Salary & Real Estate Benchmarking – Businesses can analyze salary expectations by location.

📉 Time-Series Forecasting – Train predictive models (ARIMA, Prophet, LSTM) to estimate future living costs.

📈 Inflation Impact Studies – Measure how economic conditions influence cost variations over time.

⚠️ Limitations

Synthetic Data – The dataset is not based on real survey data but follows market trends.

No Lifestyle Adjustments – Differences in household size or spending habits are not factored in.

Inflation Approximation – While inflation is simulated at 2% annually, actual inflation rates may differ.

📁 File Format & Access

nairobi_cost_of_living_time_series.csv – 60,000 records in CSV format (time-series structured).

📢 Acknowledgments

This dataset was generated for research and educational purposes. If you find it useful, consider citing it in your work. 🚀

📥 Download and Explore the Data Now!

This updated version makes your documentation more detailed and actionable for users interested in forecasting and economic analysis. Would you like help building a cost prediction model? 🚀
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Samay Ashar (2025). Stock Market Simulation Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11010423

Stock Market Simulation Dataset

📈 A Realistic Synthetic Dataset for Time-Series Forecasting & Stock Analysis

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/11010423

Dataset updated

Mar 12, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Samay Ashar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Key Features

1000 days of synthetic stock market data (from January 1, 2022, onwards).
Multiple companies from diverse industries (Technology, Finance, Healthcare, Energy, Consumer Goods, Automotive, Aerospace, etc.).
Stock price details: Open, High, Low, Close prices.
Trading volume and market capitalization.
Financial metrics: P/E Ratio, Dividend Yield, Volatility.
Sentiment Score: A measure of market sentiment (-1 to 1 scale).
Trend Labeling: Bullish, Bearish, or Stable, based on Markov Chain modeling.

Column Name	Description
Date	Trading date
Company	Stock name (e.g., Apple, Tesla, JPMorgan, etc.)
Sector	Industry classification
Open	Opening price of the stock
High	Highest price of the stock for the day
Low	Lowest price of the stock for the day
Close	Closing price of the stock
Volume	Number of shares traded
Market_Cap	Market capitalization (in USD)
PE_Ratio	Price-to-Earnings ratio
Dividend_Yield	Percentage of dividends relative to stock price
Volatility	Measure of stock price fluctuation
Sentiment_Score	Market sentiment (-1 to 1 scale)
Trend	Stock market trend (Bullish, Bearish, or Stable)

Usage Scenarios

PS: If you find this dataset helpful, please consider upvoting :)

Clear search

Close search

Google apps

Main menu

Stock Market Simulation Dataset

Key Features

Usage Scenarios

PS: If you find this dataset helpful, please consider upvoting :)

S2 Data -

Coefficients of ARIMA(7,0,7).

Delhi Power Load with Weather & Development

Key Features:

Potential Applications:

Selection of best model based on criteria.

Aerospace Artificial Intelligence (AI) Market Analysis North America,...

Snapshot img

Global Synthetic Data Tool Market Research Report: By Type (Image...

Heat pump COP drop - synthetic faults

Artificial Intelligence-As-A-Service (AIaaS) Market Analysis, Size, and...

Snapshot img

Comparative analysis of existing literature.

Data from: From Chaos to Harmony: Addressing Data De-Noising, Complexity and...

KPI prediction dataset

Spacecraft Thruster Firing Test Dataset

Giant Mud Crab Molting Visual Dataset

Cost of Living in Nairobi

🏡 Cost of Living in Nairobi, Kenya

📌 Overview

📊 Data Summary

📍 Areas Covered

🛠️ How the Data Was Generated

🔍 Potential Use Cases

⚠️ Limitations

📁 File Format & Access

📢 Acknowledgments

📥 Download and Explore the Data Now!

Stock Market Simulation Dataset

📈 A Realistic Synthetic Dataset for Time-Series Forecasting & Stock Analysis

Key Features

Usage Scenarios

PS: If you find this dataset helpful, please consider upvoting :)