Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Stock Market Dataset for AI-Driven Prediction and Trading Strategy Optimization" is designed to simulate real-world stock market data for training and evaluating machine learning models. This dataset includes a combination of technical indicators, market metrics, sentiment scores, and macroeconomic factors, providing a comprehensive foundation for developing and testing AI models for stock price prediction and trading strategy optimization.
Key Features Market Metrics:
Open, High, Low, Close Prices: Daily stock price movement. Volume: Represents the trading activity during the day. Technical Indicators:
RSI (Relative Strength Index): A momentum oscillator to measure the speed and change of price movements. MACD (Moving Average Convergence Divergence): An indicator to reveal changes in strength, direction, momentum, and duration of a trend. Bollinger Bands: Upper and lower bands around a stock price to measure volatility. Sentiment Analysis:
Sentiment Score: Simulated sentiment derived from financial news and social media, ranging from -1 (negative) to 1 (positive). Macroeconomic Factors:
GDP Growth: Indicates the overall health and growth of the economy. Inflation Rate: Reflects changes in purchasing power and economic stability. Target Variable:
Buy/Sell Signal: Binary classification (1 = Buy, 0 = Sell) based on price movement thresholds, simulating actionable trading decisions. Use Cases AI Model Training: Ideal for building stock prediction models using LSTM, Gradient Boosting, Random Forest, etc. Trading Strategy Optimization: Enables testing of trading algorithms and strategies in a simulated environment. Sentiment Analysis Research: Useful for understanding how sentiment influences stock movements. Feature Engineering and Selection: Provides a diverse set of features for experimentation with advanced techniques like PCA and LDA. Dataset Highlights Synthetic Yet Realistic: Carefully designed to mimic real-world financial data trends and relationships. Comprehensive Coverage: Includes key indicators and metrics used by traders and analysts. Scalable: Suitable for use in both small-scale academic projects and larger AI-driven trading platforms. Accessible for All Levels: The intuitive structure ensures that even beginners can utilize this dataset for financial machine learning applications. File Format The dataset is provided in CSV format, where:
Rows represent individual trading days. Columns represent features (technical indicators, market metrics, etc.) and the target variable. Acknowledgments This dataset is synthetically generated and is intended for research and educational purposes. It is not based on real market data and should not be used for actual trading.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The final dataset utilised for the publication "Investigating Reinforcement Learning Approaches In Stock Market Trading" was processed by downloading and combining data from multiple reputable sources to suit the specific needs of this project. Raw data were retrieved by downloading them using a Python finance API. Afterwards, Python and NumPy were used to combine and normalise the data to create the final dataset.The raw data was sourced as follows:Stock Prices of NVIDIA & AMD, Financial Indexes, and Commodity Prices: Retrieved from Yahoo Finance.Economic Indicators: Collected from the US Federal Reserve.The dataset was normalised to minute intervals, and the stock prices were adjusted to account for stock splits.This dataset was used for exploring the application of reinforcement learning in stock market trading. After creating the dataset, it was used in s reinforcement learning environment to train several reinforcement learning algorithms, including deep Q-learning, policy networks, policy networks with baselines, actor-critic methods, and time series incorporation. The performance of these algorithms was then compared based on profit made and other financial evaluation metrics, to investigate the application of reinforcement learning algorithms in stock market trading.The attached 'README.txt' contains methodological information and a glossary of all the variables in the .csv file.
Facebook
TwitterOverview This dataset is a collection of 10,000+ high quality images of supermarket & store display shelves that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects.
Use case The dataset could be used for various AI & Computer Vision models: Store Management, Stock Monitoring, Customer Experience, Sales Analysis, Cashierless Checkout,... Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.
About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ or contact via our email admin.bi@pixta.co.jp.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.
This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.
This analyse will be helpful for those working in Finance or Share Market domain.
From this dataset, we extract various insights using Python in our Project.
1) How much amount the companies spent on R & D ?
2) Revenue Earned by the companies
3) Date-wise Impact on the Stock
4) Events when Maximum Stock Impact was observed
5) AI Revenue Growth of the companies
6) Correlation between the columns
7) Expenditure vs Revenue year-by-year
8) Event Impact Analysis
9) Change in the index wrt Year & Company
These are the main Features/Columns available in the dataset :
1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.
2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".
3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.
4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.
5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.
6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.
7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides daily historical stock price data for The Coca-Cola Company (ticker: KO) from January 2, 1962 to April 6, 2025. It captures Coca-Cola’s stock performance through decades of economic cycles, technological shifts, and global events — making it a rich resource for time-series analysis, investment research, and machine learning projects.
| Column Name | Description |
|---|---|
date | Date of trading |
open | Opening price of the day |
high | Highest price of the day |
low | Lowest price of the day |
close | Closing price of the day |
adj_close | Adjusted closing price (accounts for splits/dividends) |
volume | Total shares traded on the day |
This dataset is for educational and research purposes only. For financial trading or commercial use, always consult a licensed data provider.
This dataset was compiled to support learning in data science, finance, and AI fields. Feel free to use it in your projects — and if you do, share your work! 📬 Contect info:
You can contect me for more data sets any type of data you want.
-X
Facebook
TwitterOverview This dataset is a collection of high view traffic images in multiple scenes, backgrounds and lighting conditions that are ready to use for optimizing the accuracy of computer vision models. All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organisations to enable their creative and machine learning projects.
Use case This dataset is used for AI solutions training & testing in various cases: Traffic monitoring, Traffic camera system, Vehicle flow estimation,... Each data set is supported by both AI and human review process to ensure labelling consistency and accuracy. Contact us for more custom datasets.
About PIXTA PIXTASTOCK is the largest Asian-featured stock platform providing data, contents, tools and services since 2005. PIXTA experiences 15 years of integrating advanced AI technology in managing, curating, processing over 100M visual materials and serving global leading brands for their creative and data demands. Visit us at https://www.pixta.ai/ for more details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Continental AI Stock Count is a dataset for object detection tasks - it contains Tyres On Trolleys annotations for 524 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: This dataset has been superseded by the dataset found at "End-Use Load Profiles for the U.S. Building Stock" (submission 4520; linked in the submission resources), which is a comprehensive and validated representation of hourly load profiles in the U.S. commercial and residential building stock. The End-Use Load Profiles project website includes links to data viewers for this new dataset. For documentation of dataset validation, model calibration, and uncertainty quantification, see Wilson et al. (2022).
These data were first created around 2012 as a byproduct of various analyses of solar photovoltaics and solar water heating (see references below for are two examples). This dataset contains several errors and limitations. It is recommended that users of this dataset transition to the updated version of the dataset posted in the resources. This dataset contains weather data, commercial load profile data, and residential load profile data.
Weather The Typical Meteorological Year 3 (TMY3) provides one year of hourly data for around 1,000 locations. The TMY weather represents 30-year normals, which are typical weather conditions over a 30-year period.
Commercial The commercial load profiles included are the 16 ASHRAE 90.1-2004 DOE Commercial Prototype Models simulated in all TMY3 locations, with building insulation levels changing based on ASHRAE 90.1-2004 requirements in each climate zone. The folder names within each resource represent the weather station location of the profiles, whereas the file names represent the building type and the representative city for the ASHRAE climate zone that was used to determine code compliance insulation levels. As indicated by the file names, all building models represent construction that complied with the ASHRAE 90.1-2004 building energy code requirements. No older or newer vintages of buildings are represented.
Residential The BASE residential load profiles are five EnergyPlus models (one per climate region) representing 2009 IECC construction single-family detached homes simulated in all TMY3 locations. No older or newer vintages of buildings are represented. Each of the five climate regions include only one heating fuel type; electric heating is only found in the Hot-Humid climate. Air conditioning is not found in the Marine climate region.
One major issue with the residential profiles is that for each of the five climate zones, certain location-specific algorithms from one city were applied to entire climate zones. For example, in the Hot-Humid files, the heating season calculated for Tampa, FL (December 1 - March 31) was unknowingly applied to all other locations in the Hot-Humid zone, which restricts heating operation outside of those days (for example, heating is disabled in Dallas, TX during cold weather in November). This causes the heating energy to be artificially low in colder parts of that climate zone, and conversely the cooling season restriction leads to artificially low cooling energy use in hotter parts of each climate zone. Additionally, the ground temperatures for the representative city were used across the entire climate zone. This affects water heating energy use (because inlet cold water temperature depends on ground temperature) and heating/cooling energy use (because of ground heat transfer through foundation walls and floors). Representative cities were Tampa, FL (Hot-Humid), El Paso, TX (Mixed-Dry/Hot-Dry), Memphis, TN (Mixed-Humid), Arcata, CA (Marine), and Billings, MT (Cold/Very-Cold).
The residential dataset includes a HIGH building load profile that was intended to provide a rough approximation of older home vintages, but it combines poor thermal insulation with larger house size, tighter thermostat setpoints, and less efficient HVAC equipment. Conversely, the LOW building combines excellent thermal insulation with smaller house size, wider thermostat setpoints, and more efficient HVAC equipment. However, it is not known how well these HIGH and LOW permutations represent the range of energy use in the housing stock.
Note that on July 2nd, 2013, the Residential High and Low load files were updated from 366 days in a year for leap years to the more general 365 days in a normal year. The archived residential load data is included from prior to this date.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Activities of Daily Living Object DatasetOverviewThe ADL (Activities of Daily Living) Object Dataset is a curated collection of images and annotations specifically focusing on objects commonly interacted with during daily living activities. This dataset is designed to facilitate research and development in assistive robotics in home environments.Data Sources and LicensingThe dataset comprises images and annotations sourced from four publicly available datasets:COCO DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), 740–755.Open Images DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2020). The Open Images Dataset V6: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision, 128(7), 1956–1981.LVIS DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5356–5364.Roboflow UniverseLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation: The following repositories from Roboflow Universe were used in compiling this dataset:Work, U. AI Based Automatic Stationery Billing System Data Dataset. 2022. Accessible at: https://universe.roboflow.com/university-work/ai-based-automatic-stationery-billing-system-data (accessed on 11 October 2024).Destruction, P.M. Pencilcase Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/pencilcase-se7nb (accessed on 11 October 2024).Destruction, P.M. Final Project Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/final-project-wsuvj (accessed on 11 October 2024).Personal. CSST106 Dataset. 2024. Accessible at: https://universe.roboflow.com/personal-pgkq6/csst106 (accessed on 11 October 2024).New-Workspace-kubz3. Pencilcase Dataset. 2022. Accessible at: https://universe.roboflow.com/new-workspace-kubz3/pencilcase-s9ag9 (accessed on 11 October 2024).Finespiralnotebook. Spiral Notebook Dataset. 2024. Accessible at: https://universe.roboflow.com/finespiralnotebook/spiral_notebook (accessed on 11 October 2024).Dairymilk. Classmate Dataset. 2024. Accessible at: https://universe.roboflow.com/dairymilk/classmate (accessed on 11 October 2024).Dziubatyi, M. Domace Zadanie Notebook Dataset. 2023. Accessible at: https://universe.roboflow.com/maksym-dziubatyi/domace-zadanie-notebook (accessed on 11 October 2024).One. Stationery Dataset. 2024. Accessible at: https://universe.roboflow.com/one-vrmjr/stationery-mxtt2 (accessed on 11 October 2024).jk001226. Liplip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/liplip (accessed on 11 October 2024).jk001226. Lip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/lip-uteep (accessed on 11 October 2024).Upwork5. Socks3 Dataset. 2022. Accessible at: https://universe.roboflow.com/upwork5/socks3 (accessed on 11 October 2024).Book. DeskTableLamps Material Dataset. 2024. Accessible at: https://universe.roboflow.com/book-mxasl/desktablelamps-material-rjbgd (accessed on 11 October 2024).Gary. Medicine Jar Dataset. 2024. Accessible at: https://universe.roboflow.com/gary-ofgwc/medicine-jar (accessed on 11 October 2024).TEST. Kolmarbnh Dataset. 2023. Accessible at: https://universe.roboflow.com/test-wj4qi/kolmarbnh (accessed on 11 October 2024).Tube. Tube Dataset. 2024. Accessible at: https://universe.roboflow.com/tube-nv2vt/tube-9ah9t (accessed on 11 October 2024). Staj. Canned Goods Dataset. 2024. Accessible at: https://universe.roboflow.com/staj-2ipmz/canned-goods-isxbi (accessed on 11 October 2024).Hussam, M. Wallet Dataset. 2024. Accessible at: https://universe.roboflow.com/mohamed-hussam-cq81o/wallet-sn9n2 (accessed on 14 October 2024).Training, K. Perfume Dataset. 2022. Accessible at: https://universe.roboflow.com/kdigital-training/perfume (accessed on 14 October 2024).Keyboards. Shoe-Walking Dataset. 2024. Accessible at: https://universe.roboflow.com/keyboards-tjtri/shoe-walking (accessed on 14 October 2024).MOMO. Toilet Paper Dataset. 2024. Accessible at: https://universe.roboflow.com/momo-nutwk/toilet-paper-wehrw (accessed on 14 October 2024).Project-zlrja. Toilet Paper Detection Dataset. 2024. Accessible at: https://universe.roboflow.com/project-zlrja/toilet-paper-detection (accessed on 14 October 2024).Govorkov, Y. Highlighter Detection Dataset. 2023. Accessible at: https://universe.roboflow.com/yuriy-govorkov-j9qrv/highlighter_detection (accessed on 14 October 2024).Stock. Plum Dataset. 2024. Accessible at: https://universe.roboflow.com/stock-qxdzf/plum-kdznw (accessed on 14 October 2024).Ibnu. Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/ibnu-h3cda/avocado-g9fsl (accessed on 14 October 2024).Molina, N. Detection Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/norberto-molina-zakki/detection-avocado (accessed on 14 October 2024).in Lab, V.F. Peach Dataset. 2023. Accessible at: https://universe.roboflow.com/vietnam-fruit-in-lab/peach-ejdry (accessed on 14 October 2024).Group, K. Tomato Detection 4 Dataset. 2023. Accessible at: https://universe.roboflow.com/kkabs-group-dkcni/tomato-detection-4 (accessed on 14 October 2024).Detection, M. Tomato Checker Dataset. 2024. Accessible at: https://universe.roboflow.com/money-detection-xez0r/tomato-checker (accessed on 14 October 2024).University, A.S. Smart Cam V1 Dataset. 2023. Accessible at: https://universe.roboflow.com/ain-shams-university-byja6/smart_cam_v1 (accessed on 14 October 2024).EMAD, S. Keysdetection Dataset. 2023. Accessible at: https://universe.roboflow.com/shehab-emad-n2q9i/keysdetection (accessed on 14 October 2024).Roads. Chips Dataset. 2024. Accessible at: https://universe.roboflow.com/roads-rvmaq/chips-a0us5 (accessed on 14 October 2024).workspace bgkzo, N. Object Dataset. 2021. Accessible at: https://universe.roboflow.com/new-workspace-bgkzo/object-eidim (accessed on 14 October 2024).Watch, W. Wrist Watch Dataset. 2024. Accessible at: https://universe.roboflow.com/wrist-watch/wrist-watch-0l25c (accessed on 14 October 2024).WYZUP. Milk Dataset. 2024. Accessible at: https://universe.roboflow.com/wyzup/milk-onbxt (accessed on 14 October 2024).AussieStuff. Food Dataset. 2024. Accessible at: https://universe.roboflow.com/aussiestuff/food-al9wr (accessed on 14 October 2024).Almukhametov, A. Pencils Color Dataset. 2023. Accessible at: https://universe.roboflow.com/almas-almukhametov-hs5jk/pencils-color (accessed on 14 October 2024).All images and annotations obtained from these datasets are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits sharing and adaptation of the material in any medium or format, for any purpose, even commercially, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.Redistribution Permission:As all images and annotations are under the CC BY 4.0 license, we are legally permitted to redistribute this data within our dataset. We have complied with the license terms by:Providing appropriate attribution to the original creators.Including links to the CC BY 4.0 license.Indicating any changes made to the original material.Dataset StructureThe dataset includes:Images: High-quality images featuring ADL objects suitable for robotic manipulation.Annotations: Bounding boxes and class labels formatted in the YOLO (You Only Look Once) Darknet format.ClassesThe dataset focuses on objects commonly involved in daily living activities. A full list of object classes is provided in the classes.txt file.FormatImages: JPEG format.Annotations: Text files corresponding to each image, containing bounding box coordinates and class labels in YOLO Darknet format.How to Use the DatasetDownload the DatasetUnpack the Datasetunzip ADL_Object_Dataset.zipHow to Cite This DatasetIf you use this dataset in your research, please cite our paper:@article{shahria2024activities, title={Activities of Daily Living Object Dataset: Advancing Assistive Robotic Manipulation with a Tailored Dataset}, author={Shahria, Md Tanzil and Rahman, Mohammad H.}, journal={Sensors}, volume={24}, number={23}, pages={7566}, year={2024}, publisher={MDPI}}LicenseThis dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License Link: https://creativecommons.org/licenses/by/4.0/By using this dataset, you agree to provide appropriate credit, indicate if changes were made, and not impose additional restrictions beyond those of the original licenses.AcknowledgmentsWe gratefully acknowledge the use of data from the following open-source datasets, which were instrumental in the creation of our specialized ADL object dataset:COCO Dataset: We thank the creators and contributors of the COCO dataset for making their images and annotations publicly available under the CC BY 4.0 license.Open Images Dataset: We express our gratitude to the Open Images team for providing a comprehensive dataset of annotated images under the CC BY 4.0 license.LVIS Dataset: We appreciate the efforts of the LVIS dataset creators for releasing their extensive dataset under the CC BY 4.0 license.Roboflow Universe:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Facebook
TwitterThis project will investigate the link between nutrients, food availability, and the survival of Atlantic bluefin tuna larvae which can be used to improve stock assessments for this commercially and recreationally important species.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset organized stock market data from 01-04-2023 to 05-13-2024 from Yahoo Finance. The dataset is intended to perform machine learning tasks include but not limited to: 1. Simple Linear Regression 2. Multiple Linear Regression 3. Multivariate Regression Test ( implemented in this project and analyzed in the report) 4. PCA ( implemented in this project and analyzed in the report) 5. Factor Analysis ( implemented in this project and analyzed in the report) 6. ARIMA
The key idea is to utilize historical stock market dataset to predict Next Day adjusted close price by using various variables. The predictive power and importance of each variable will be evaluated using PCA and VIF score.
The project aims to figure out the feasibility to predict stock market adjusted closing price for the trendy AI stock NVIDIA and filter out the most important indicator of stock price prediction.
Features included in this dataset:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The project is for automated processing of home video camera feeds. This dataset includes both daytime and nighttime (IR) images, typically from perspective of a typical camera.
I suggest splitting the dataset and training two models: one for daytime and the other for nighttime. The nighttime pictures have a single channenl while the daytime ones have three channels, this results in significantly different features being trained. I identify if the image has one or three channels using the following shell command: identify -colorspace HSL -verbose "$f" | egrep -q "(Channel 0: 1-bit|red: 1-bit)"
The images are full size, so different sized models can be created. I've been training at 608x608. It includes many null images which have in the past triggered a false positives.
The classes are simply the things of interest I've seen from my house. In general this is more useful than the standard yolo classes, such as Zebra. However, you may want to have bear or some other wildlife. I've found squirrels are too small for my cameras to reliable pickup and detect. The perspective and framing of content is quite different from typical stock photos, so I think it makes a lot of sense to train the model using only images from ipcams.
Ideally, I will make models available for the many different tools people are using for AI already, including: Deepstack / BlueIris MotionEye Frigate
Facebook
Twitter1) The purpose of this project is to document juvenile salmon habitat occurrence in the Lower Columbia River and estuary, and examine how habitat conditions influence their distribution, health, and abundance. We also want to monitor habitat conditions and indicators of salmon health in these environments. Parameters measured include habitat conditions such as vegetation, water temperature, and dissolved oxygen; salmon diet and prey availability; weight, length, growth rate, lipid content, genetic stock, and chemical contaminant exposure.
2) Lyndal Johnson (NFWSC FTE) is the project lead, and other primary staff involved are Sean Sol and Paul Olson (NWFSC FTEs) and Kate Macneale (NWFSC term employee), but the project also involves other NWFSC FTEs, other term employees, contractors, and staff from other programs (Environmental Chemistry) and Divisions (FE, CB), as well as staff from collaborating agencies (i.e, the Lower Columbia River Estuary Partnership, USGS, PNNL, OHSU).
3) The project involves field surveys in which parameters measured include habitat conditions such as vegetation, water temperature, and dissolved oxygen; salmon diet and prey availability; weight, length, growth rate, lipid content, genetic stock, and chemical contaminant exposure.
4) Specific products include annual reports for the Lower Columbia Estuary Partnership, and manuscripts in peer-reviewed journals.
5) Specific audiences include (but are not limited to) the Bonneville Power Administration and other federal, state, and local agencies involved with salmon recovery and environmental management in the Columbia Basin (e.g., EPA, Washington Department of Ecology, Oregon Department of Environmental Quality, the City of Portland); the NMFS regional office, and other agency and academic scientists.
6) This is a stand-alone project, but it is also a component of a larger monitoring program overseen by the Estuary Partnership in which other tasks are conducted by collaborators in USGS, PNNL, and OHSU.
7) This is an ongoing project with a soft completion deadline; however, there are no final deadlines with specific tasks to be completed on a yearly basis.
Genetic stock information for chinook salmon from Lower Columbia River sites.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
The dataset is taken from Kaggle.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Stock Price Time Series for Eurotech S.p.A.. Eurotech S.p.A. provides edge computing and industrial Internet of Things (IIoT) solutions in Italy, Europe, North America, and Asia. Its product portfolio includes integrated hardware and software, and edge hardware, software, and AI appliance. The company offers certified cybersecurity, an integrated hardware and software solution for IoT and edge AI projects; and plug and play edge, a ready-to-use edge solution, which includes southbound protocols, no-code/low-code programming model, and certified cloud connector with digital twins. In addition, it provides edge software integration and customization, which are designed to support third-party edge platforms, such as Azure IoT edge and AWS IoT Greengrass; and edge AI enablement, an open platform to deploy any AI software at the edge comprising everyware software framework (ESF) to build edge appliances to run containerized AI applications, and pre-trained models to perform edge AI inference. Further, the company offers life-cycle management solution to support product life cycle; and configuration management, which includes hardware and software order code management, to SKU to meet the customer's application needs and security compliance. Eurotech S.p.A. serves industrial automation, transportation and off road, energy and utilities, and medical and pharmaceutical industries. The company was founded in 1992 and is headquartered in Amaro, Italy.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides synthetic yet realistic data for analyzing and forecasting retail store inventory demand. It contains over 73000 rows of daily data across multiple stores and products, including attributes like sales, inventory levels, pricing, weather, promotions, and holidays.
The dataset is ideal for practicing machine learning tasks such as demand forecasting, dynamic pricing, and inventory optimization. It allows data scientists to explore time series forecasting techniques, study the impact of external factors like weather and holidays on sales, and build advanced models to optimize supply chain performance.
Challenge 1: Time Series Demand Forecasting Predict daily product demand across stores using historical sales and inventory data. Can you build an LSTM-based forecasting model that outperforms classical methods like ARIMA?
Challenge 2: Inventory Optimization Optimize inventory levels by analyzing sales trends and minimizing stockouts while reducing overstock situations.
Challenge 3: Dynamic Pricing Develop a pricing strategy based on demand, competitor pricing, and discounts to maximize revenue.
Date: Daily records from [start_date] to [end_date]. Store ID & Product ID: Unique identifiers for stores and products. Category: Product categories like Electronics, Clothing, Groceries, etc. Region: Geographic region of the store. Inventory Level: Stock available at the beginning of the day. Units Sold: Units sold during the day. Demand Forecast: Predicted demand based on past trends. Weather Condition: Daily weather impacting sales. Holiday/Promotion: Indicators for holidays or promotions.
Exploratory Data Analysis (EDA): Analyze sales trends, visualize data, and identify patterns. Time Series Forecasting: Train models like ARIMA, Prophet, or LSTM to predict future demand. Pricing Analysis: Study how discounts and competitor pricing affect sales.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
💬 About Dataset This dataset contains the Analysts Q&A portion of Meta Platforms, Inc. (META) Earnings Call transcripts.
Content: It is a collection of question-and-answer exchanges between financial analysts and Meta executives (such as Mark Zuckerberg, Sheryl Sandberg, and David Wehner) during quarterly earnings calls.
Files: The data is separated into CSV files for specific quarters, for example: META_Q1_2021_qna.csv, META_Q3_2024_qna.csv, etc., allowing for time-series analysis of analyst interest and company responses.
Columns: The files typically include columns for the questioner_name, questioner_organization, the analyst's question, and the corresponding answers from Meta executives.
Topics: The content covers a range of topics discussed during the calls, including financial performance, advertising, social networks, the creator economy, and investment in AR/VR and the Metaverse.
Use Cases: This data is valuable for financial analysis, natural language processing (NLP) tasks, sentiment analysis, and for training Large Language Models (LLMs) to answer finance- or company-specific questions.
The current dataset version contains 22 files and covers earnings call Q&A sessions from Q3 2020 up to Q2 2025.
💡 Project Ideas (NLP & Data Science) Executive Tone and Sentiment Analysis:
Project: Develop a model to classify the sentiment (e.g., optimistic, cautious, concerned) in the answers provided by Meta executives (Mark Zuckerberg, Sheryl Sandberg, etc.) over time.
Goal: Track how the company's tone changes across quarters and correlate it with the stock price (alpha generation) or market events.
Analyst Concern and Topic Modeling:
Project: Use Topic Modeling (like LDA or BERT-based methods) on the analysts' questions to identify the most pressing concerns each quarter (e.g., competition, Metaverse investment, advertising changes, regulatory risk).
Goal: Visualize the evolution of Wall Street's focus on Meta over the years.
Question Answering (QA) System / Financial Chatbot:
Project: Fine-tune a Large Language Model (LLM) like BERT or a transformer model (e.g., FinBERT) on the Q&A pairs to create a specialized Financial AI Agent that can answer questions about Meta's historical priorities and statements, using the dataset as its knowledge base.
Goal: Build a RAG (Retrieval-Augmented Generation) system to quickly extract specific facts and quotes from the transcripts.
Executive Response Style Analysis:
Project: Analyze linguistic features—like readability, word choice, and use of contrastive words—to measure the evasiveness or directness of executive answers when responding to tough analyst questions.
Goal: Predict which questions are most likely to receive a less direct answer.
📈 Use Cases (Finance & Research) Investment Research and Signal Generation:
Use Case: Extract meaningful investment signals. For example, quantifying the frequency of discussion around key topics (like "Reels monetization" or "AI infrastructure") can indicate future company focus and capital allocation.
Benefit: Provides a systematic, data-driven way to supplement traditional financial models.
Competitive and Trend Analysis:
Use Case: Compare the language and themes discussed in Meta's calls with those of key competitors (e.g., Google, Amazon) to understand industry-wide priorities or areas where Meta is gaining or losing ground.
Quantifying Financial Risk:
Use Case: Use NLP to identify and track the mentions and sentiment around negative or uncertain terms (e.g., "headwinds," "supply chain," "regulatory risk") to build an early warning system for potential downside risk management.
Training Domain-Specific LLMs:
Use Case: The structured Q&A data is highly valuable for pre-training or fine-tuning LLMs on financial domain-specific language to improve their ability to understand and generate text in a corporate finance context.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Here are a few use cases for this project:
Automated Vending Machine Inventory: The "first-data" model can be used to keep track of inventory in vending machines. For instance, an AI system could periodically scan the arrangement and quantity of the drinks to identify whether the machine is out of stock of Sprite, Cola, or Fanta.
Intelligent Recycling Bins: Computer vision can be deployed in smart recycling bins to identify the type of drink containers being discarded. By distinguishing between Sprite, Cola, and Fanta cans or bottles, the system can provide more precise data for recycling studies, or apply specific recycling processes.
Brand Market Analysis: In a retail environment, insights about the popularity of different drink brands can be drawn by using security footage to identify the purchase of Sprite, Cola, or Fanta drinks.
Advertising Efficacy Metrics: Brands can use such a model in assessing the effectiveness of their in-store advertising or placements by using security cameras to analyze the selection preference of customers for Sprite, Cola, and Fanta.
Health and Nutrition Studies: The model can be used in nutritional studies to track the consumption of different types of sodas. This data could be utilized to understand people's drinking habits and to plan better public health policies or interventions.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Winning the lottery has always been a dream for a lot of people. Because of that, a lot of work have been done in the past to try and tackle the challenge. We saw the rise of different approaches, from computer softwares that optimize your lottery picks to numerical and statistical analysis, as well as esoteric approaches.
More recently, with the rise of machine learning (hereafter: ML), people have tried to tackle the problem by trying to have a ML model predict the next combination. However, most of those who undertook to try this approach had poor understanding of ML-related basics such as data preprocessing and signal processing.
As an example of bad things that were done, there were projects that built a ML model that took raw draw results and outputted N numbers, hoping that those N numbers would be the correct combination of the next draw. To give you an analogy with a real world example, it would be like taking in the raw data (bytes) of an audio file as the (sole) input of a model and hope that it outputs a concept like bpm, music genre, ... Of course, this resulted in extremely bad results and ML models that didn't learn a single thing. Similarly, it would be the same as trying to predict stock prices using only the price as the sole input variable. It would be bound to fail without creating higher-level features. Models that do not do that would never succeed.
Because Machine-Learning-based approaches were bound to fail before even beginning unless something was done regarding data and signal processing, I decided to make my contribution by crafting higher-level features (/ abstract concepts) from historic lottery data.
I leave under mouth discussions about the mathematical theory of probability (which I explained more in the repository of Lofea, a project I created to generate this dataset) or why mathematicians say it would be theoretically impossible to predict. Those specifics and other questions can be discussed in the comments section. To let people still dream enough to try and tackle the problem, I'd like to point out that stock market prices (which are also numerical time series data) are said to be unpredictable due to the Efficient Market Hypothesis. Regardless, firms and individuals have been trying their best to try and predict the evolution of stock prices. Although theories tell us something is impossible in theory, there might be a practical implementation flaw that might get exploited if studied carefully enough. Who knows unless they try ?
Preprocessed historical results.
Tackling a big and complex task often requires problem solving methodologies, such as divide and conquer. This is why instead of tackling a regular pick 6 among 49 or so lottery, this dataset focuses on simple 1/10 lottery data (i.e. pick 1 among 10). But it also includes a version for the Euromillions (a 5/50 lottery).
You will find in the archive features.04-2021 files containing computed features, as well as the whole draw histories used to compute them.
One lottery is the Euromillions, and the other is TrioMagic, though similar datasets can be crafted for lotteries that share their respective formats.
Regarding 1/10 lottery, since most of 1/10 lotteries have several pools (/columns) from which one has to pick, this kind of dataset with higher-level features can be created for each column individually and compared among different lotteries.
The big idea here is to preprocess historic draws as if it was a signal or a time serie and create higher-level features based on it. Inspired by the approach of working with numerical time series signals such as stock market prices.
There are several high-level concepts we may want to predict, such as the parity of the next draw. This would be a classification problem. You may also choose to tackle a regression problem, such as trying to predict the repartition of rate of even numbers in the next N draws (for instance N=2, 3 or 5). There are also other possible targets besides the parity, such as the Universe length, which will be described below.
The target you choose to predict may influence what kind of features you will try to include or craft.
Several of the features included in this dataset are based on a concept I came up with I called « Universe Length ».
Basically, Universe Length (referred to as ULen in the dataset) is the number of different numbers in a given time frame.
For instance in a 1/10 lottery, Universe Length over a time frame of 10 draws with the ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The "Stock Market Dataset for AI-Driven Prediction and Trading Strategy Optimization" is designed to simulate real-world stock market data for training and evaluating machine learning models. This dataset includes a combination of technical indicators, market metrics, sentiment scores, and macroeconomic factors, providing a comprehensive foundation for developing and testing AI models for stock price prediction and trading strategy optimization.
Key Features Market Metrics:
Open, High, Low, Close Prices: Daily stock price movement. Volume: Represents the trading activity during the day. Technical Indicators:
RSI (Relative Strength Index): A momentum oscillator to measure the speed and change of price movements. MACD (Moving Average Convergence Divergence): An indicator to reveal changes in strength, direction, momentum, and duration of a trend. Bollinger Bands: Upper and lower bands around a stock price to measure volatility. Sentiment Analysis:
Sentiment Score: Simulated sentiment derived from financial news and social media, ranging from -1 (negative) to 1 (positive). Macroeconomic Factors:
GDP Growth: Indicates the overall health and growth of the economy. Inflation Rate: Reflects changes in purchasing power and economic stability. Target Variable:
Buy/Sell Signal: Binary classification (1 = Buy, 0 = Sell) based on price movement thresholds, simulating actionable trading decisions. Use Cases AI Model Training: Ideal for building stock prediction models using LSTM, Gradient Boosting, Random Forest, etc. Trading Strategy Optimization: Enables testing of trading algorithms and strategies in a simulated environment. Sentiment Analysis Research: Useful for understanding how sentiment influences stock movements. Feature Engineering and Selection: Provides a diverse set of features for experimentation with advanced techniques like PCA and LDA. Dataset Highlights Synthetic Yet Realistic: Carefully designed to mimic real-world financial data trends and relationships. Comprehensive Coverage: Includes key indicators and metrics used by traders and analysts. Scalable: Suitable for use in both small-scale academic projects and larger AI-driven trading platforms. Accessible for All Levels: The intuitive structure ensures that even beginners can utilize this dataset for financial machine learning applications. File Format The dataset is provided in CSV format, where:
Rows represent individual trading days. Columns represent features (technical indicators, market metrics, etc.) and the target variable. Acknowledgments This dataset is synthetically generated and is intended for research and educational purposes. It is not based on real market data and should not be used for actual trading.