25 datasets found

Stock Market Dataset
kaggle.com
zip
Updated Jan 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziya (2025). Stock Market Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/stock-market-dataset
Explore at:
zip(1075471 bytes)Available download formats
Dataset updated
Jan 25, 2025
Authors
Ziya
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The "Stock Market Dataset for AI-Driven Prediction and Trading Strategy Optimization" is designed to simulate real-world stock market data for training and evaluating machine learning models. This dataset includes a combination of technical indicators, market metrics, sentiment scores, and macroeconomic factors, providing a comprehensive foundation for developing and testing AI models for stock price prediction and trading strategy optimization.

Key Features Market Metrics:

Open, High, Low, Close Prices: Daily stock price movement. Volume: Represents the trading activity during the day. Technical Indicators:

RSI (Relative Strength Index): A momentum oscillator to measure the speed and change of price movements. MACD (Moving Average Convergence Divergence): An indicator to reveal changes in strength, direction, momentum, and duration of a trend. Bollinger Bands: Upper and lower bands around a stock price to measure volatility. Sentiment Analysis:

Sentiment Score: Simulated sentiment derived from financial news and social media, ranging from -1 (negative) to 1 (positive). Macroeconomic Factors:

GDP Growth: Indicates the overall health and growth of the economy. Inflation Rate: Reflects changes in purchasing power and economic stability. Target Variable:

Buy/Sell Signal: Binary classification (1 = Buy, 0 = Sell) based on price movement thresholds, simulating actionable trading decisions. Use Cases AI Model Training: Ideal for building stock prediction models using LSTM, Gradient Boosting, Random Forest, etc. Trading Strategy Optimization: Enables testing of trading algorithms and strategies in a simulated environment. Sentiment Analysis Research: Useful for understanding how sentiment influences stock movements. Feature Engineering and Selection: Provides a diverse set of features for experimentation with advanced techniques like PCA and LDA. Dataset Highlights Synthetic Yet Realistic: Carefully designed to mimic real-world financial data trends and relationships. Comprehensive Coverage: Includes key indicators and metrics used by traders and analysts. Scalable: Suitable for use in both small-scale academic projects and larger AI-driven trading platforms. Accessible for All Levels: The intuitive structure ensures that even beginners can utilize this dataset for financial machine learning applications. File Format The dataset is provided in CSV format, where:

Rows represent individual trading days. Columns represent features (technical indicators, market metrics, etc.) and the target variable. Acknowledgments This dataset is synthetically generated and is intended for research and educational purposes. It is not based on real market data and should not be used for actual trading.
n
Research data underpinning "Investigating Reinforcement Learning Approaches...
data.ncl.ac.uk
application/csv
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zheng Luo (2024). Research data underpinning "Investigating Reinforcement Learning Approaches In Stock Market Trading" [Dataset]. http://doi.org/10.25405/data.ncl.26539735.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.25405/data.ncl.26539735.v1
Dataset updated
Aug 13, 2024
Dataset provided by
Newcastle University
Authors
Zheng Luo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The final dataset utilised for the publication "Investigating Reinforcement Learning Approaches In Stock Market Trading" was processed by downloading and combining data from multiple reputable sources to suit the specific needs of this project. Raw data were retrieved by downloading them using a Python finance API. Afterwards, Python and NumPy were used to combine and normalise the data to create the final dataset.The raw data was sourced as follows:Stock Prices of NVIDIA & AMD, Financial Indexes, and Commodity Prices: Retrieved from Yahoo Finance.Economic Indicators: Collected from the US Federal Reserve.The dataset was normalised to minute intervals, and the stock prices were adjusted to account for stock splits.This dataset was used for exploring the application of reinforcement learning in stock market trading. After creating the dataset, it was used in s reinforcement learning environment to train several reinforcement learning algorithms, including deep Q-learning, policy networks, policy networks with baselines, actor-critic methods, and time series incorporation. The performance of these algorithms was then compared based on profit made and other financial evaluation metrics, to investigate the application of reinforcement learning algorithms in stock market trading.The attached 'README.txt' contains methodological information and a glossary of all the variables in the .csv file.
d
Pixta AI | Annotated Imagery Data | Global | 10,000 Stock Images |...
datarade.ai
Updated Nov 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pixta AI (2022). Pixta AI | Annotated Imagery Data | Global | 10,000 Stock Images | Annotation and Labelling Services Provided | Supermarket Display Shelves Dataset [Dataset]. https://datarade.ai/data-products/10-000-supermarket-display-shelves-for-ai-ml-model-pixta-ai
Explore at:
.json, .xml, .csv, .txtAvailable download formats
Dataset updated
Nov 24, 2022
Dataset authored and provided by
Pixta AI
Area covered
Germany, Japan, United States of America, Australia, France, Singapore, Malaysia, Vietnam, New Zealand, Hungary
Description
Overview This dataset is a collection of 10,000+ high quality images of supermarket & store display shelves that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects.

Use case The dataset could be used for various AI & Computer Vision models: Store Management, Stock Monitoring, Customer Experience, Sales Analysis, Cashierless Checkout,... Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.

About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ or contact via our email admin.bi@pixta.co.jp.
AI Financial Market Data
kaggle.com
zip
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data/suggestions
Explore at:
zip(123167 bytes)Available download formats
Dataset updated
Aug 6, 2025
Authors
Data Science Lovers
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

🖇️Connect with me on LinkedIn - https://www.linkedin.com/in/rohit-grewal

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

This analyse will be helpful for those working in Finance or Share Market domain.

From this dataset, we extract various insights using Python in our Project.

1) How much amount the companies spent on R & D ?

2) Revenue Earned by the companies

3) Date-wise Impact on the Stock

4) Events when Maximum Stock Impact was observed

5) AI Revenue Growth of the companies

6) Correlation between the columns

7) Expenditure vs Revenue year-by-year

8) Event Impact Analysis

9) Change in the index wrt Year & Company

These are the main Features/Columns available in the dataset :

1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.

Coca-Cola Stock Data: Over 100 Years of Trading

kaggle.com

zip

Updated Sep 14, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhammad Atif Latif (2025). Coca-Cola Stock Data: Over 100 Years of Trading [Dataset]. https://www.kaggle.com/datasets/muhammadatiflatif/coca-cola-stock-data-over-100-years-of-trading

Explore at:

zip(1834170 bytes)Available download formats

Dataset updated

Sep 14, 2025

Authors

Muhammad Atif Latif

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🥤 Coca-Cola (KO) Stock Price History (1919–2025)

This dataset provides daily historical stock price data for The Coca-Cola Company (ticker: KO) from January 2, 1962 to April 6, 2025. It captures Coca-Cola’s stock performance through decades of economic cycles, technological shifts, and global events — making it a rich resource for time-series analysis, investment research, and machine learning projects.

📂 Dataset Overview

Column Name	Description
`date`	Date of trading
`open`	Opening price of the day
`high`	Highest price of the day
`low`	Lowest price of the day
`close`	Closing price of the day
`adj_close`	Adjusted closing price (accounts for splits/dividends)
`volume`	Total shares traded on the day

🧮 Dataset Dimensions

Total Rows: 15,922
Total Columns: 7
Missing Values: None ✅
Date Range: 1962-01-02 to 2025-04-06

📊 Summary Statistics

Highest Close Price: $73.18
Lowest Close Price: $0.19
Max Volume: 124M+ shares
Average Close Price: ~$18.45
Adjusted Prices: Range from $0.03 to $73.18

💡 Use Cases

Time-series forecasting with LSTM, ARIMA, Prophet
Volatility analysis and pattern detection
Financial data visualization across decades
Backtesting trading strategies on long-term data
Comparing adjusted vs. raw stock prices

🧠 Project Ideas

Predict future stock prices using ML models
Visualize price trends during major economic events
Analyze the effect of dividends and stock splits
Build a financial dashboard using Plotly or Streamlit

📎 License

This dataset is for educational and research purposes only. For financial trading or commercial use, always consult a licensed data provider.

🙌 Acknowledgment

This dataset was compiled to support learning in data science, finance, and AI fields. Feel free to use it in your projects — and if you do, share your work! 📬 Contect info:

You can contect me for more data sets any type of data you want.

-X

d
Pixta AI | Imagery Data | Global | 10,000 Stock Images | Annotation and...
datarade.ai
.json, .xml, .csv
Updated Nov 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pixta AI (2022). Pixta AI | Imagery Data | Global | 10,000 Stock Images | Annotation and Labelling Services Provided | Traffic scenes from high view for AI & ML [Dataset]. https://datarade.ai/data-products/10-000-traffic-scenes-from-high-view-for-ai-ml-model-pixta-ai
Explore at:
.json, .xml, .csvAvailable download formats
Dataset updated
Nov 12, 2022
Dataset authored and provided by
Pixta AI
Area covered
Japan, New Zealand, Singapore, Taiwan, United States of America, Canada, Malaysia, Hong Kong, Australia, Korea (Republic of)
Description
Overview This dataset is a collection of high view traffic images in multiple scenes, backgrounds and lighting conditions that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects.

Use case This dataset is used for AI solutions training & testing in various cases: Traffic monitoring, Traffic camera system, Vehicle flow estimation,... Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.

About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ for more details.
R
Continental Ai Stock Count Dataset
universe.roboflow.com
zip
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taylors University Malaysia (2024). Continental Ai Stock Count Dataset [Dataset]. https://universe.roboflow.com/taylors-university-malaysia/continental-ai-stock-count
Explore at:
zipAvailable download formats
Dataset updated
May 27, 2024
Dataset authored and provided by
Taylors University Malaysia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Tyres On Trolleys Bounding Boxes
Description
Continental AI Stock Count

## Overview Continental AI Stock Count is a dataset for object detection tasks - it contains Tyres On Trolleys annotations for 524 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Commercial and Residential Hourly Load Profiles for all TMY3 Locations in...
data.openei.org
s.cnmilf.com
+3more
archive +2
Updated Nov 25, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Ong; Nathan Clark; Sean Ong; Nathan Clark (2014). Commercial and Residential Hourly Load Profiles for all TMY3 Locations in the United States [Dataset]. http://doi.org/10.25984/1788456
Explore at:
website, archive, image_documentAvailable download formats
Unique identifier
https://doi.org/10.25984/1788456
Dataset updated
Nov 25, 2014
Dataset provided by
United States Department of Energyhttp://energy.gov/
Open Energy Data Initiative (OEDI)
National Renewable Energy Laboratory
Authors
Sean Ong; Nathan Clark; Sean Ong; Nathan Clark
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Note: This dataset has been superseded by the dataset found at "End-Use Load Profiles for the U.S. Building Stock" (submission 4520; linked in the submission resources), which is a comprehensive and validated representation of hourly load profiles in the U.S. commercial and residential building stock. The End-Use Load Profiles project website includes links to data viewers for this new dataset. For documentation of dataset validation, model calibration, and uncertainty quantification, see Wilson et al. (2022).

These data were first created around 2012 as a byproduct of various analyses of solar photovoltaics and solar water heating (see references below for are two examples). This dataset contains several errors and limitations. It is recommended that users of this dataset transition to the updated version of the dataset posted in the resources. This dataset contains weather data, commercial load profile data, and residential load profile data.

Weather The Typical Meteorological Year 3 (TMY3) provides one year of hourly data for around 1,000 locations. The TMY weather represents 30-year normals, which are typical weather conditions over a 30-year period.

Commercial The commercial load profiles included are the 16 ASHRAE 90.1-2004 DOE Commercial Prototype Models simulated in all TMY3 locations, with building insulation levels changing based on ASHRAE 90.1-2004 requirements in each climate zone. The folder names within each resource represent the weather station location of the profiles, whereas the file names represent the building type and the representative city for the ASHRAE climate zone that was used to determine code compliance insulation levels. As indicated by the file names, all building models represent construction that complied with the ASHRAE 90.1-2004 building energy code requirements. No older or newer vintages of buildings are represented.

Residential The BASE residential load profiles are five EnergyPlus models (one per climate region) representing 2009 IECC construction single-family detached homes simulated in all TMY3 locations. No older or newer vintages of buildings are represented. Each of the five climate regions include only one heating fuel type; electric heating is only found in the Hot-Humid climate. Air conditioning is not found in the Marine climate region.

One major issue with the residential profiles is that for each of the five climate zones, certain location-specific algorithms from one city were applied to entire climate zones. For example, in the Hot-Humid files, the heating season calculated for Tampa, FL (December 1 - March 31) was unknowingly applied to all other locations in the Hot-Humid zone, which restricts heating operation outside of those days (for example, heating is disabled in Dallas, TX during cold weather in November). This causes the heating energy to be artificially low in colder parts of that climate zone, and conversely the cooling season restriction leads to artificially low cooling energy use in hotter parts of each climate zone. Additionally, the ground temperatures for the representative city were used across the entire climate zone. This affects water heating energy use (because inlet cold water temperature depends on ground temperature) and heating/cooling energy use (because of ground heat transfer through foundation walls and floors). Representative cities were Tampa, FL (Hot-Humid), El Paso, TX (Mixed-Dry/Hot-Dry), Memphis, TN (Mixed-Humid), Arcata, CA (Marine), and Billings, MT (Cold/Very-Cold).

The residential dataset includes a HIGH building load profile that was intended to provide a rough approximation of older home vintages, but it combines poor thermal insulation with larger house size, tighter thermostat setpoints, and less efficient HVAC equipment. Conversely, the LOW building combines excellent thermal insulation with smaller house size, wider thermostat setpoints, and more efficient HVAC equipment. However, it is not known how well these HIGH and LOW permutations represent the range of energy use in the housing stock.

Note that on July 2nd, 2013, the Residential High and Low load files were updated from 366 days in a year for leap years to the more general 365 days in a normal year. The archived residential load data is included from prior to this date.
Activities of Daily Living Object Dataset
figshare.com
bin
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Tanzil Shahria; Mohammad H Rahman (2024). Activities of Daily Living Object Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27263424.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27263424.v3
Dataset updated
Nov 28, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Md Tanzil Shahria; Mohammad H Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Activities of Daily Living Object DatasetOverviewThe ADL (Activities of Daily Living) Object Dataset is a curated collection of images and annotations specifically focusing on objects commonly interacted with during daily living activities. This dataset is designed to facilitate research and development in assistive robotics in home environments.Data Sources and LicensingThe dataset comprises images and annotations sourced from four publicly available datasets:COCO DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), 740–755.Open Images DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2020). The Open Images Dataset V6: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision, 128(7), 1956–1981.LVIS DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5356–5364.Roboflow UniverseLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation: The following repositories from Roboflow Universe were used in compiling this dataset:Work, U. AI Based Automatic Stationery Billing System Data Dataset. 2022. Accessible at: https://universe.roboflow.com/university-work/ai-based-automatic-stationery-billing-system-data (accessed on 11 October 2024).Destruction, P.M. Pencilcase Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/pencilcase-se7nb (accessed on 11 October 2024).Destruction, P.M. Final Project Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/final-project-wsuvj (accessed on 11 October 2024).Personal. CSST106 Dataset. 2024. Accessible at: https://universe.roboflow.com/personal-pgkq6/csst106 (accessed on 11 October 2024).New-Workspace-kubz3. Pencilcase Dataset. 2022. Accessible at: https://universe.roboflow.com/new-workspace-kubz3/pencilcase-s9ag9 (accessed on 11 October 2024).Finespiralnotebook. Spiral Notebook Dataset. 2024. Accessible at: https://universe.roboflow.com/finespiralnotebook/spiral_notebook (accessed on 11 October 2024).Dairymilk. Classmate Dataset. 2024. Accessible at: https://universe.roboflow.com/dairymilk/classmate (accessed on 11 October 2024).Dziubatyi, M. Domace Zadanie Notebook Dataset. 2023. Accessible at: https://universe.roboflow.com/maksym-dziubatyi/domace-zadanie-notebook (accessed on 11 October 2024).One. Stationery Dataset. 2024. Accessible at: https://universe.roboflow.com/one-vrmjr/stationery-mxtt2 (accessed on 11 October 2024).jk001226. Liplip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/liplip (accessed on 11 October 2024).jk001226. Lip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/lip-uteep (accessed on 11 October 2024).Upwork5. Socks3 Dataset. 2022. Accessible at: https://universe.roboflow.com/upwork5/socks3 (accessed on 11 October 2024).Book. DeskTableLamps Material Dataset. 2024. Accessible at: https://universe.roboflow.com/book-mxasl/desktablelamps-material-rjbgd (accessed on 11 October 2024).Gary. Medicine Jar Dataset. 2024. Accessible at: https://universe.roboflow.com/gary-ofgwc/medicine-jar (accessed on 11 October 2024).TEST. Kolmarbnh Dataset. 2023. Accessible at: https://universe.roboflow.com/test-wj4qi/kolmarbnh (accessed on 11 October 2024).Tube. Tube Dataset. 2024. Accessible at: https://universe.roboflow.com/tube-nv2vt/tube-9ah9t (accessed on 11 October 2024). Staj. Canned Goods Dataset. 2024. Accessible at: https://universe.roboflow.com/staj-2ipmz/canned-goods-isxbi (accessed on 11 October 2024).Hussam, M. Wallet Dataset. 2024. Accessible at: https://universe.roboflow.com/mohamed-hussam-cq81o/wallet-sn9n2 (accessed on 14 October 2024).Training, K. Perfume Dataset. 2022. Accessible at: https://universe.roboflow.com/kdigital-training/perfume (accessed on 14 October 2024).Keyboards. Shoe-Walking Dataset. 2024. Accessible at: https://universe.roboflow.com/keyboards-tjtri/shoe-walking (accessed on 14 October 2024).MOMO. Toilet Paper Dataset. 2024. Accessible at: https://universe.roboflow.com/momo-nutwk/toilet-paper-wehrw (accessed on 14 October 2024).Project-zlrja. Toilet Paper Detection Dataset. 2024. Accessible at: https://universe.roboflow.com/project-zlrja/toilet-paper-detection (accessed on 14 October 2024).Govorkov, Y. Highlighter Detection Dataset. 2023. Accessible at: https://universe.roboflow.com/yuriy-govorkov-j9qrv/highlighter_detection (accessed on 14 October 2024).Stock. Plum Dataset. 2024. Accessible at: https://universe.roboflow.com/stock-qxdzf/plum-kdznw (accessed on 14 October 2024).Ibnu. Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/ibnu-h3cda/avocado-g9fsl (accessed on 14 October 2024).Molina, N. Detection Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/norberto-molina-zakki/detection-avocado (accessed on 14 October 2024).in Lab, V.F. Peach Dataset. 2023. Accessible at: https://universe.roboflow.com/vietnam-fruit-in-lab/peach-ejdry (accessed on 14 October 2024).Group, K. Tomato Detection 4 Dataset. 2023. Accessible at: https://universe.roboflow.com/kkabs-group-dkcni/tomato-detection-4 (accessed on 14 October 2024).Detection, M. Tomato Checker Dataset. 2024. Accessible at: https://universe.roboflow.com/money-detection-xez0r/tomato-checker (accessed on 14 October 2024).University, A.S. Smart Cam V1 Dataset. 2023. Accessible at: https://universe.roboflow.com/ain-shams-university-byja6/smart_cam_v1 (accessed on 14 October 2024).EMAD, S. Keysdetection Dataset. 2023. Accessible at: https://universe.roboflow.com/shehab-emad-n2q9i/keysdetection (accessed on 14 October 2024).Roads. Chips Dataset. 2024. Accessible at: https://universe.roboflow.com/roads-rvmaq/chips-a0us5 (accessed on 14 October 2024).workspace bgkzo, N. Object Dataset. 2021. Accessible at: https://universe.roboflow.com/new-workspace-bgkzo/object-eidim (accessed on 14 October 2024).Watch, W. Wrist Watch Dataset. 2024. Accessible at: https://universe.roboflow.com/wrist-watch/wrist-watch-0l25c (accessed on 14 October 2024).WYZUP. Milk Dataset. 2024. Accessible at: https://universe.roboflow.com/wyzup/milk-onbxt (accessed on 14 October 2024).AussieStuff. Food Dataset. 2024. Accessible at: https://universe.roboflow.com/aussiestuff/food-al9wr (accessed on 14 October 2024).Almukhametov, A. Pencils Color Dataset. 2023. Accessible at: https://universe.roboflow.com/almas-almukhametov-hs5jk/pencils-color (accessed on 14 October 2024).All images and annotations obtained from these datasets are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits sharing and adaptation of the material in any medium or format, for any purpose, even commercially, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.Redistribution Permission:As all images and annotations are under the CC BY 4.0 license, we are legally permitted to redistribute this data within our dataset. We have complied with the license terms by:Providing appropriate attribution to the original creators.Including links to the CC BY 4.0 license.Indicating any changes made to the original material.Dataset StructureThe dataset includes:Images: High-quality images featuring ADL objects suitable for robotic manipulation.Annotations: Bounding boxes and class labels formatted in the YOLO (You Only Look Once) Darknet format.ClassesThe dataset focuses on objects commonly involved in daily living activities. A full list of object classes is provided in the classes.txt file.FormatImages: JPEG format.Annotations: Text files corresponding to each image, containing bounding box coordinates and class labels in YOLO Darknet format.How to Use the DatasetDownload the DatasetUnpack the Datasetunzip ADL_Object_Dataset.zipHow to Cite This DatasetIf you use this dataset in your research, please cite our paper:@article{shahria2024activities, title={Activities of Daily Living Object Dataset: Advancing Assistive Robotic Manipulation with a Tailored Dataset}, author={Shahria, Md Tanzil and Rahman, Mohammad H.}, journal={Sensors}, volume={24}, number={23}, pages={7566}, year={2024}, publisher={MDPI}}LicenseThis dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License Link: https://creativecommons.org/licenses/by/4.0/By using this dataset, you agree to provide appropriate credit, indicate if changes were made, and not impose additional restrictions beyond those of the original licenses.AcknowledgmentsWe gratefully acknowledge the use of data from the following open-source datasets, which were instrumental in the creation of our specialized ADL object dataset:COCO Dataset: We thank the creators and contributors of the COCO dataset for making their images and annotations publicly available under the CC BY 4.0 license.Open Images Dataset: We express our gratitude to the Open Images team for providing a comprehensive dataset of annotated images under the CC BY 4.0 license.LVIS Dataset: We appreciate the efforts of the LVIS dataset creators for releasing their extensive dataset under the CC BY 4.0 license.Roboflow Universe:
g
Chinook Bycatch - Contemporary Salmon Genetic Stock Composition Estimates
gimi9.com
datasets.ai
+2more
Updated Mar 15, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Chinook Bycatch - Contemporary Salmon Genetic Stock Composition Estimates [Dataset]. https://gimi9.com/dataset/data-gov_chinook-bycatch-contemporary-salmon-genetic-stock-composition-estimates4/
Explore at:
Dataset updated
Mar 15, 2013
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The purpose of this project is to measure and monitor impacts on ESA-listed populations and to estimate overall Chinook salmon stock composition in bycatch associated with the hake fisheries (including tribal and non-tribal, shoreside, and at-sea). The work is carried out by NOAA staff. 2. Fin clips collected by NOAA observers are characterized genetically allowing allocation of the mixture to source populations. 3. Written reports are produced regularly and forwarded to the NWR and interested parties. 4. The NWR is the primary management audience in facilitation of NOAA's MSA obligations to measure and monitor ESA impacts. 5. This is an ongoing project (see also Historical Salmon Genetic Stock Composition Estimates). 6. This is a stand-alone project in one sense, but is closely coordinated with stock composition studies in directed harvest as well as similar studies of historical bycatch using archival scale material. 7. There are no specific hard deadlines associated with this project. Chinook genetics bycatch in hake.
d
RESTORE Sponsored Research Project: Effects of nitrogen sources and plankton...
catalog.data.gov
datasets.ai
+1more
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). RESTORE Sponsored Research Project: Effects of nitrogen sources and plankton food-web dynamics on habitat quality for the larvae of Atlantic Bluefin Tuna in the Gulf of Mexico [Dataset]. https://catalog.data.gov/dataset/restore-sponsored-research-project-effects-of-nitrogen-sources-and-plankton-food-web-dynamics-o1
Explore at:
Dataset updated
May 22, 2025
Dataset provided by
(Point of Contact, Custodian)
Area covered
Gulf of Mexico (Gulf of America), Atlantic Ocean
Description
This project will investigate the link between nutrients, food availability, and the survival of Atlantic bluefin tuna larvae which can be used to improve stock assessments for this commercially and recreationally important species.
NVIDIA Historical Market Data 2023-2024 for ML
kaggle.com
zip
Updated May 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataZng (2024). NVIDIA Historical Market Data 2023-2024 for ML [Dataset]. https://www.kaggle.com/datasets/datazng/nvidia-historical-market-data-2023-2024-for-ml
Explore at:
zip(1474222 bytes)Available download formats
Dataset updated
May 23, 2024
Authors
DataZng
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset organized stock market data from 01-04-2023 to 05-13-2024 from Yahoo Finance. The dataset is intended to perform machine learning tasks include but not limited to: 1. Simple Linear Regression 2. Multiple Linear Regression 3. Multivariate Regression Test ( implemented in this project and analyzed in the report) 4. PCA ( implemented in this project and analyzed in the report) 5. Factor Analysis ( implemented in this project and analyzed in the report) 6. ARIMA

The key idea is to utilize historical stock market dataset to predict Next Day adjusted close price by using various variables. The predictive power and importance of each variable will be evaluated using PCA and VIF score.

The project aims to figure out the feasibility to predict stock market adjusted closing price for the trendy AI stock NVIDIA and filter out the most important indicator of stock price prediction.

Features included in this dataset:

Predictive Variable: Next Day Adjusted Close price

Independent Variables and its Types

Date ( Time Series, Numerical) 2.Prior DayOpen ( Numerical)

LogRange ( Stock Daily High - Low, risk indicator) ***Risk Indicator

Log_Volume (Log Value of Daily Traded Volume) ***Momentum Indicator

Return_Percentage ( [Today AdjClose - Prior Day AdjClose)/Prior Day AdjClose]*100% ) ***This is a profit Indicator

3_Day_Avg_AdjClose(Market Price Level Assassin indicatory, Delay=3) *** Historical Market Price Level Assessment

PriorDay_AdjClose ( prior day adjusted closed price)
R
Ipcams2 Dataset
universe.roboflow.com
zip
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Egge Public (2025). Ipcams2 Dataset [Dataset]. https://universe.roboflow.com/egge-public/ipcams2
Explore at:
zipAvailable download formats
Dataset updated
Nov 24, 2025
Dataset authored and provided by
Egge Public
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Animals People Bounding Boxes
Description
The project is for automated processing of home video camera feeds. This dataset includes both daytime and nighttime (IR) images, typically from perspective of a typical camera.

I suggest splitting the dataset and training two models: one for daytime and the other for nighttime. The nighttime pictures have a single channenl while the daytime ones have three channels, this results in significantly different features being trained. I identify if the image has one or three channels using the following shell command: identify -colorspace HSL -verbose "$f" | egrep -q "(Channel 0: 1-bit|red: 1-bit)"

The images are full size, so different sized models can be created. I've been training at 608x608. It includes many null images which have in the past triggered a false positives.

The classes are simply the things of interest I've seen from my house. In general this is more useful than the standard yolo classes, such as Zebra. However, you may want to have bear or some other wildlife. I've found squirrels are too small for my cameras to reliable pickup and detect. The perspective and framing of content is quite different from typical stock photos, so I think it makes a lot of sense to train the model using only images from ipcams.

Ideally, I will make models available for the many different tools people are using for AI already, including: Deepstack / BlueIris MotionEye Frigate
LCREP genetic stock ID - Lower Columbia River Ecosystem Monitoring Project
datasets.ai
s.cnmilf.com
+2more
0, 33
Updated Oct 27, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Oceanic and Atmospheric Administration, Department of Commerce (2022). LCREP genetic stock ID - Lower Columbia River Ecosystem Monitoring Project [Dataset]. https://datasets.ai/datasets/lcrep-genetic-stock-id-lower-columbia-river-ecosystem-monitoring-project1
Explore at:
0, 33Available download formats
Dataset updated
Oct 27, 2022
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Authors
National Oceanic and Atmospheric Administration, Department of Commerce
Area covered
Columbia River
Description
1) The purpose of this project is to document juvenile salmon habitat occurrence in the Lower Columbia River and estuary, and examine how habitat conditions influence their distribution, health, and abundance. We also want to monitor habitat conditions and indicators of salmon health in these environments. Parameters measured include habitat conditions such as vegetation, water temperature, and dissolved oxygen; salmon diet and prey availability; weight, length, growth rate, lipid content, genetic stock, and chemical contaminant exposure.

2) Lyndal Johnson (NFWSC FTE) is the project lead, and other primary staff involved are Sean Sol and Paul Olson (NWFSC FTEs) and Kate Macneale (NWFSC term employee), but the project also involves other NWFSC FTEs, other term employees, contractors, and staff from other programs (Environmental Chemistry) and Divisions (FE, CB), as well as staff from collaborating agencies (i.e, the Lower Columbia River Estuary Partnership, USGS, PNNL, OHSU).

3) The project involves field surveys in which parameters measured include habitat conditions such as vegetation, water temperature, and dissolved oxygen; salmon diet and prey availability; weight, length, growth rate, lipid content, genetic stock, and chemical contaminant exposure.

4) Specific products include annual reports for the Lower Columbia Estuary Partnership, and manuscripts in peer-reviewed journals.

5) Specific audiences include (but are not limited to) the Bonneville Power Administration and other federal, state, and local agencies involved with salmon recovery and environmental management in the Columbia Basin (e.g., EPA, Washington Department of Ecology, Oregon Department of Environmental Quality, the City of Portland); the NMFS regional office, and other agency and academic scientists.

6) This is a stand-alone project, but it is also a component of a larger monitoring program overseen by the Estuary Partnership in which other tasks are conducted by collaborators in USGS, PNNL, and OHSU.

7) This is an ongoing project with a soft completion deadline; however, there are no final deadlines with specific tasks to be completed on a yearly basis.

Genetic stock information for chinook salmon from Lower Columbia River sites.
Walmart Dataset
kaggle.com
zip
Updated Dec 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Walmart Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/walmart-dataset
Explore at:
zip(125095 bytes)Available download formats
Dataset updated
Dec 26, 2021
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">

Description:

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

Acknowledgements

The dataset is taken from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t single & multiple features.

Also evaluate the models & compare their respective scores like R2, RMSE, etc.
m
Eurotech S.p.A. - Stock Price Series
macro-rankings.com
csv, excel
Updated Dec 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
macro-rankings (2024). Eurotech S.p.A. - Stock Price Series [Dataset]. https://www.macro-rankings.com/markets/stocks/eth-mi
Explore at:
csv, excelAvailable download formats
Dataset updated
Dec 31, 2024
Dataset authored and provided by
macro-rankings
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
italy
Description
Stock Price Time Series for Eurotech S.p.A.. Eurotech S.p.A. provides edge computing and industrial Internet of Things (IIoT) solutions in Italy, Europe, North America, and Asia. Its product portfolio includes integrated hardware and software, and edge hardware, software, and AI appliance. The company offers certified cybersecurity, an integrated hardware and software solution for IoT and edge AI projects; and plug and play edge, a ready-to-use edge solution, which includes southbound protocols, no-code/low-code programming model, and certified cloud connector with digital twins. In addition, it provides edge software integration and customization, which are designed to support third-party edge platforms, such as Azure IoT edge and AWS IoT Greengrass; and edge AI enablement, an open platform to deploy any AI software at the edge comprising everyware software framework (ESF) to build edge appliances to run containerized AI applications, and pre-trained models to perform edge AI inference. Further, the company offers life-cycle management solution to support product life cycle; and configuration management, which includes hardware and software order code management, to SKU to meet the customer's application needs and security compliance. Eurotech S.p.A. serves industrial automation, transportation and off road, energy and utilities, and medical and pharmaceutical industries. The company was founded in 1992 and is headquartered in Amaro, Italy.
Retail Store Inventory Forecasting Dataset
kaggle.com
zip
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anirudh Singh Chauhan (2024). Retail Store Inventory Forecasting Dataset [Dataset]. https://www.kaggle.com/datasets/anirudhchauhan/retail-store-inventory-forecasting-dataset
Explore at:
zip(1588139 bytes)Available download formats
Dataset updated
Nov 24, 2024
Authors
Anirudh Singh Chauhan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides synthetic yet realistic data for analyzing and forecasting retail store inventory demand. It contains over 73000 rows of daily data across multiple stores and products, including attributes like sales, inventory levels, pricing, weather, promotions, and holidays.

The dataset is ideal for practicing machine learning tasks such as demand forecasting, dynamic pricing, and inventory optimization. It allows data scientists to explore time series forecasting techniques, study the impact of external factors like weather and holidays on sales, and build advanced models to optimize supply chain performance.

Challenges for Data Scientists:

Challenge 1: Time Series Demand Forecasting Predict daily product demand across stores using historical sales and inventory data. Can you build an LSTM-based forecasting model that outperforms classical methods like ARIMA?

Challenge 2: Inventory Optimization Optimize inventory levels by analyzing sales trends and minimizing stockouts while reducing overstock situations.

Challenge 3: Dynamic Pricing Develop a pricing strategy based on demand, competitor pricing, and discounts to maximize revenue.

Key Data Features:

Date: Daily records from [start_date] to [end_date]. Store ID & Product ID: Unique identifiers for stores and products. Category: Product categories like Electronics, Clothing, Groceries, etc. Region: Geographic region of the store. Inventory Level: Stock available at the beginning of the day. Units Sold: Units sold during the day. Demand Forecast: Predicted demand based on past trends. Weather Condition: Daily weather impacting sales. Holiday/Promotion: Indicators for holidays or promotions.

Example Notebook Ideas

Exploratory Data Analysis (EDA): Analyze sales trends, visualize data, and identify patterns. Time Series Forecasting: Train models like ARIMA, Prophet, or LSTM to predict future demand. Pricing Analysis: Study how discounts and competitor pricing affect sales.
META Earnings Call Q&A Dataset
kaggle.com
zip
Updated Nov 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Devaang Barthwal (2025). META Earnings Call Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/devaangbarthwal/meta-earnings-call-q-and-a-dataset
Explore at:
zip(376463 bytes)Available download formats
Dataset updated
Nov 19, 2025
Authors
Devaang Barthwal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
💬 About Dataset This dataset contains the Analysts Q&A portion of Meta Platforms, Inc. (META) Earnings Call transcripts.

Content: It is a collection of question-and-answer exchanges between financial analysts and Meta executives (such as Mark Zuckerberg, Sheryl Sandberg, and David Wehner) during quarterly earnings calls.

Files: The data is separated into CSV files for specific quarters, for example: META_Q1_2021_qna.csv, META_Q3_2024_qna.csv, etc., allowing for time-series analysis of analyst interest and company responses.

Columns: The files typically include columns for the questioner_name, questioner_organization, the analyst's question, and the corresponding answers from Meta executives.

Topics: The content covers a range of topics discussed during the calls, including financial performance, advertising, social networks, the creator economy, and investment in AR/VR and the Metaverse.

Use Cases: This data is valuable for financial analysis, natural language processing (NLP) tasks, sentiment analysis, and for training Large Language Models (LLMs) to answer finance- or company-specific questions.

The current dataset version contains 22 files and covers earnings call Q&A sessions from Q3 2020 up to Q2 2025.

💡 Project Ideas (NLP & Data Science) Executive Tone and Sentiment Analysis:

Project: Develop a model to classify the sentiment (e.g., optimistic, cautious, concerned) in the answers provided by Meta executives (Mark Zuckerberg, Sheryl Sandberg, etc.) over time.

Goal: Track how the company's tone changes across quarters and correlate it with the stock price (alpha generation) or market events.

Analyst Concern and Topic Modeling:

Project: Use Topic Modeling (like LDA or BERT-based methods) on the analysts' questions to identify the most pressing concerns each quarter (e.g., competition, Metaverse investment, advertising changes, regulatory risk).

Goal: Visualize the evolution of Wall Street's focus on Meta over the years.

Question Answering (QA) System / Financial Chatbot:

Project: Fine-tune a Large Language Model (LLM) like BERT or a transformer model (e.g., FinBERT) on the Q&A pairs to create a specialized Financial AI Agent that can answer questions about Meta's historical priorities and statements, using the dataset as its knowledge base.

Goal: Build a RAG (Retrieval-Augmented Generation) system to quickly extract specific facts and quotes from the transcripts.

Executive Response Style Analysis:

Project: Analyze linguistic features—like readability, word choice, and use of contrastive words—to measure the evasiveness or directness of executive answers when responding to tough analyst questions.

Goal: Predict which questions are most likely to receive a less direct answer.

📈 Use Cases (Finance & Research) Investment Research and Signal Generation:

Use Case: Extract meaningful investment signals. For example, quantifying the frequency of discussion around key topics (like "Reels monetization" or "AI infrastructure") can indicate future company focus and capital allocation.

Benefit: Provides a systematic, data-driven way to supplement traditional financial models.

Competitive and Trend Analysis:

Use Case: Compare the language and themes discussed in Meta's calls with those of key competitors (e.g., Google, Amazon) to understand industry-wide priorities or areas where Meta is gaining or losing ground.

Quantifying Financial Risk:

Use Case: Use NLP to identify and track the mentions and sentiment around negative or uncertain terms (e.g., "headwinds," "supply chain," "regulatory risk") to build an early warning system for potential downside risk management.

Training Domain-Specific LLMs:

Use Case: The structured Q&A data is highly valuable for pre-training or fine-tuning LLMs on financial domain-specific language to improve their ability to understand and generate text in a corporate finance context.
R
First Data 2 Dataset
universe.roboflow.com
zip
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cola new ver (2023). First Data 2 Dataset [Dataset]. https://universe.roboflow.com/cola-new-ver/first-data-2/model/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 29, 2023
Dataset authored and provided by
cola new ver
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Cola Fanta Sprite 5J8D Bounding Boxes
Description
Here are a few use cases for this project:

Automated Vending Machine Inventory: The "first-data" model can be used to keep track of inventory in vending machines. For instance, an AI system could periodically scan the arrangement and quantity of the drinks to identify whether the machine is out of stock of Sprite, Cola, or Fanta.

Intelligent Recycling Bins: Computer vision can be deployed in smart recycling bins to identify the type of drink containers being discarded. By distinguishing between Sprite, Cola, and Fanta cans or bottles, the system can provide more precise data for recycling studies, or apply specific recycling processes.

Brand Market Analysis: In a retail environment, insights about the popularity of different drink brands can be drawn by using security footage to identify the purchase of Sprite, Cola, or Fanta drinks.

Advertising Efficacy Metrics: Brands can use such a model in assessing the effectiveness of their in-store advertising or placements by using security cameras to analyze the selection preference of customers for Sprite, Cola, and Fanta.

Health and Nutrition Studies: The model can be used in nutritional studies to track the consumption of different types of sodas. This data could be utilized to understand people's drinking habits and to plan better public health policies or interventions.
Lottery features for time series Machine Learning
kaggle.com
zip
Updated Apr 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Mvutu Mabilama (2021). Lottery features for time series Machine Learning [Dataset]. https://www.kaggle.com/datasets/jmmvutu/lottery-features-for-machine-learning-ai
Explore at:
zip(1194092 bytes)Available download formats
Dataset updated
Apr 14, 2021
Authors
Jeffrey Mvutu Mabilama
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Context

Winning the lottery has always been a dream for a lot of people. Because of that, a lot of work have been done in the past to try and tackle the challenge. We saw the rise of different approaches, from computer softwares that optimize your lottery picks to numerical and statistical analysis, as well as esoteric approaches.

More recently, with the rise of machine learning (hereafter: ML), people have tried to tackle the problem by trying to have a ML model predict the next combination. However, most of those who undertook to try this approach had poor understanding of ML-related basics such as data preprocessing and signal processing.

As an example of bad things that were done, there were projects that built a ML model that took raw draw results and outputted N numbers, hoping that those N numbers would be the correct combination of the next draw. To give you an analogy with a real world example, it would be like taking in the raw data (bytes) of an audio file as the (sole) input of a model and hope that it outputs a concept like bpm, music genre, ... Of course, this resulted in extremely bad results and ML models that didn't learn a single thing. Similarly, it would be the same as trying to predict stock prices using only the price as the sole input variable. It would be bound to fail without creating higher-level features. Models that do not do that would never succeed.

Because Machine-Learning-based approaches were bound to fail before even beginning unless something was done regarding data and signal processing, I decided to make my contribution by crafting higher-level features (/ abstract concepts) from historic lottery data.

I leave under mouth discussions about the mathematical theory of probability (which I explained more in the repository of Lofea, a project I created to generate this dataset) or why mathematicians say it would be theoretically impossible to predict. Those specifics and other questions can be discussed in the comments section. To let people still dream enough to try and tackle the problem, I'd like to point out that stock market prices (which are also numerical time series data) are said to be unpredictable due to the Efficient Market Hypothesis. Regardless, firms and individuals have been trying their best to try and predict the evolution of stock prices. Although theories tell us something is impossible in theory, there might be a practical implementation flaw that might get exploited if studied carefully enough. Who knows unless they try ?

Content

Preprocessed historical results.

Tackling a big and complex task often requires problem solving methodologies, such as divide and conquer. This is why instead of tackling a regular pick 6 among 49 or so lottery, this dataset focuses on simple 1/10 lottery data (i.e. pick 1 among 10). But it also includes a version for the Euromillions (a 5/50 lottery).

You will find in the archive features.04-2021 files containing computed features, as well as the whole draw histories used to compute them. One lottery is the Euromillions, and the other is TrioMagic, though similar datasets can be crafted for lotteries that share their respective formats. Regarding 1/10 lottery, since most of 1/10 lotteries have several pools (/columns) from which one has to pick, this kind of dataset with higher-level features can be created for each column individually and compared among different lotteries.

Preprocessing

The big idea here is to preprocess historic draws as if it was a signal or a time serie and create higher-level features based on it. Inspired by the approach of working with numerical time series signals such as stock market prices.

Labels (referred to as "_Targets_" in the dataset)

There are several high-level concepts we may want to predict, such as the parity of the next draw. This would be a classification problem. You may also choose to tackle a regression problem, such as trying to predict the repartition of rate of even numbers in the next N draws (for instance N=2, 3 or 5). There are also other possible targets besides the parity, such as the Universe length, which will be described below.

The target you choose to predict may influence what kind of features you will try to include or craft.

Features

Several of the features included in this dataset are based on a concept I came up with I called « Universe Length ». Basically, Universe Length (referred to as ULen in the dataset) is the number of different numbers in a given time frame. For instance in a 1/10 lottery, Universe Length over a time frame of 10 draws with the ...

Facebook

Twitter

Click to copy link

Link copied

Cite

Ziya (2025). Stock Market Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/stock-market-dataset

Stock Market Dataset

Explore at:

zip(1075471 bytes)Available download formats

Dataset updated

Jan 25, 2025

Authors

Ziya

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

The "Stock Market Dataset for AI-Driven Prediction and Trading Strategy Optimization" is designed to simulate real-world stock market data for training and evaluating machine learning models. This dataset includes a combination of technical indicators, market metrics, sentiment scores, and macroeconomic factors, providing a comprehensive foundation for developing and testing AI models for stock price prediction and trading strategy optimization.

Key Features Market Metrics:

Open, High, Low, Close Prices: Daily stock price movement. Volume: Represents the trading activity during the day. Technical Indicators:

RSI (Relative Strength Index): A momentum oscillator to measure the speed and change of price movements. MACD (Moving Average Convergence Divergence): An indicator to reveal changes in strength, direction, momentum, and duration of a trend. Bollinger Bands: Upper and lower bands around a stock price to measure volatility. Sentiment Analysis:

Sentiment Score: Simulated sentiment derived from financial news and social media, ranging from -1 (negative) to 1 (positive). Macroeconomic Factors:

GDP Growth: Indicates the overall health and growth of the economy. Inflation Rate: Reflects changes in purchasing power and economic stability. Target Variable:

Buy/Sell Signal: Binary classification (1 = Buy, 0 = Sell) based on price movement thresholds, simulating actionable trading decisions. Use Cases AI Model Training: Ideal for building stock prediction models using LSTM, Gradient Boosting, Random Forest, etc. Trading Strategy Optimization: Enables testing of trading algorithms and strategies in a simulated environment. Sentiment Analysis Research: Useful for understanding how sentiment influences stock movements. Feature Engineering and Selection: Provides a diverse set of features for experimentation with advanced techniques like PCA and LDA. Dataset Highlights Synthetic Yet Realistic: Carefully designed to mimic real-world financial data trends and relationships. Comprehensive Coverage: Includes key indicators and metrics used by traders and analysts. Scalable: Suitable for use in both small-scale academic projects and larger AI-driven trading platforms. Accessible for All Levels: The intuitive structure ensures that even beginners can utilize this dataset for financial machine learning applications. File Format The dataset is provided in CSV format, where:

Rows represent individual trading days. Columns represent features (technical indicators, market metrics, etc.) and the target variable. Acknowledgments This dataset is synthetically generated and is intended for research and educational purposes. It is not based on real market data and should not be used for actual trading.

Clear search

Close search

Google apps

Main menu

Stock Market Dataset

Research data underpinning "Investigating Reinforcement Learning Approaches...

Pixta AI | Annotated Imagery Data | Global | 10,000 Stock Images |...

AI Financial Market Data

📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

🖇️Connect with me on LinkedIn - https://www.linkedin.com/in/rohit-grewal

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

Coca-Cola Stock Data: Over 100 Years of Trading

🥤 Coca-Cola (KO) Stock Price History (1919–2025)

📂 Dataset Overview

🧮 Dataset Dimensions

📊 Summary Statistics

💡 Use Cases

🧠 Project Ideas

📎 License

🙌 Acknowledgment

Pixta AI | Imagery Data | Global | 10,000 Stock Images | Annotation and...

Continental Ai Stock Count Dataset

Continental AI Stock Count

Commercial and Residential Hourly Load Profiles for all TMY3 Locations in...

Activities of Daily Living Object Dataset

Chinook Bycatch - Contemporary Salmon Genetic Stock Composition Estimates

RESTORE Sponsored Research Project: Effects of nitrogen sources and plankton...

NVIDIA Historical Market Data 2023-2024 for ML

Ipcams2 Dataset

LCREP genetic stock ID - Lower Columbia River Ecosystem Monitoring Project

Walmart Dataset

Description:

Acknowledgements

Objective:

Eurotech S.p.A. - Stock Price Series

Retail Store Inventory Forecasting Dataset

Challenges for Data Scientists:

Key Data Features:

Example Notebook Ideas

META Earnings Call Q&A Dataset

First Data 2 Dataset

Lottery features for time series Machine Learning

Context

Content

Preprocessing

Labels (referred to as "_Targets_" in the dataset)

Features

Stock Market Dataset