http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.
This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.
This analyse will be helpful for those working in Finance or Share Market domain.
From this dataset, we extract various insights using Python in our Project.
1) How much amount the companies spent on R & D ?
2) Revenue Earned by the companies
3) Date-wise Impact on the Stock
4) Events when Maximum Stock Impact was observed
5) AI Revenue Growth of the companies
6) Correlation between the columns
7) Expenditure vs Revenue year-by-year
8) Event Impact Analysis
9) Change in the index wrt Year & Company
These are the main Features/Columns available in the dataset :
1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.
2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".
3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.
4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.
5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.
6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.
7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
This is a dataset of paraphrases created by ChatGPT. Model based on this dataset is avaible: model
We used this prompt to generate paraphrases
Generate 5 similar paraphrases for this question, show it like a numbered list without commentaries: {text} This dataset is based on the Quora paraphrase question, texts from the SQUAD 2.0 and the CNN news dataset. We generated 5 paraphrases for each sample, totally this dataset has about 420k data rows. You can make 30 rows from a row from… See the full description on the dataset page: https://huggingface.co/datasets/humarin/chatgpt-paraphrases.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
RavenStack is a fictional AI-powered collaboration platform used to simulate a real-world SaaS business. This simulated dataset was created using Python and ChatGPT specifically for people learning data analysis, business intelligence, or data science. It offers a realistic environment to practice SQL joins, cohort analysis, churn modeling, revenue tracking, and support analytics using a multi-table relational structure.
The dataset spans 5 CSV files:
accounts.csv – customer metadata
subscriptions.csv – subscription lifecycles and revenue
feature_usage.csv – daily product interaction logs
support_tickets.csv – support activity and satisfaction scores
churn_events.csv – churn dates, reasons, and refund behaviors
Users can explore trial-to-paid conversion, MRR trends, upgrade funnels, feature adoption, support patterns, churn drivers, and reactivation cycles. The dataset supports temporal and cohort analyses, and has built-in edge cases for testing real-world logic.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Codebook for the Dataset provided
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations.
The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations.
@article{10.1145/3696459,
author = {Mart\'{\i}nez, Gonzalo and Hern\'{a}ndez, Jos\'{e} Alberto and Conde, Javier and Reviriego, Pedro and Merino-G\'{o}mez, Elena},
title = {Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3696459},
doi = {10.1145/3696459},
abstract = ,
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = sep,
keywords = {LLM, Lexical diversity, ChatGPT, Evaluation}
}
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset contains 14,934 instructions, contexts and responses, in several natural language categories such as classification, closed QA, generation, etc. The English original dataset was created by @databricks, who crowd-sourced the data creation via its employees. The current dataset is a translation of that dataset through ChatGPT (gpt-3.5-turbo
).
Data Instances
{
"id": 14963,
"instruction": "Wat zijn de duurste steden ter wereld?",
"context": "",
"response": "Dit is een uitgebreide lijst van de duurste steden: Singapore, Tel Aviv, New York, Hong Kong, Los Angeles, Zurich, Genève, San Francisco, Parijs en Sydney.",
"category": "brainstorming"
}
Data Fields
[1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966]
Dataset Creation
Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo
. max_tokens=1024, temperature=0
as parameters.
The prompt template to translate the input is (where src_lang
was English and tgt_lang
Dutch):
CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional context to the task, and the response to the task, from {src_lang} to {tgt_lang}.
Here are the requirements that you should adhere to:
1. maintain the format: the task consists of a task instruction (marked `instruction: `), optional context to the task (marked `context: `) and response for the task marked with `response: `;
2. do not translate the identifiers `instruction: `, `context: `, and `response: ` but instead copy them to your output;
3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
4. translate the instruction and context text using informal, but standard, language;
5. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the context in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang};
7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the context, nor the translation in the response (just copy them as-is);
8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English.
Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else.
"""
The system message was:
You are a helpful assistant that translates English to Dutch according to the requirements that are given to you.
Note that 77 items (0.5%) were not successfully translated. This can either mean that the prompt was too long for the given limit (max_tokens=1024
) or that the generated translation could not be parsed into instruction
, context
and response
fields. The missing IDs are [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966]
.
Initial Data Collection and Normalization
Initial data collection by databricks. See their repository for more information about this dataset.
Considerations for Using the Data
Note that the translations in this new dataset have not been verified by humans! Use at your own risk, both in terms of quality and biases.
Discussion of Biases
As with any machine-generated texts, users should be aware of potential biases that are included in this dataset. Although the prompt specifically includes make sure to avoid biases (such as gender bias, grammatical bias, social bias)
, of course the impact of such command is not known. It is likely that biases remain in the dataset so use with caution.
Other Known Limitations
The translation quality has not been verified. Use at your own risk!
Licensing Information
This repository follows the original databricks license, which is CC BY-SA 3.0 but see below for a specific restriction.
This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo
), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.
If you use this dataset, you must also follow the Sharing and Usage policies.
As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.
This dataset is also available on the Hugging Face hub, its canonical repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Monarch Butterfly Detector is an advanced computer vision model that detects and localizes Monarch butterflies in images. With its cutting-edge technology and high accuracy, this model opens up exciting possibilities for biodiversity monitoring, migration studies, citizen science projects, identification guides, and environmental education.
Accurate Detection: The Monarch Butterfly Detector utilizes state-of-the-art computer vision algorithms to accurately identify and localize Monarch butterflies within images.
Versatile Use Cases: This powerful model has diverse applications, ranging from scientific research and conservation efforts to citizen science projects and environmental education programs.
Easy Integration: The Monarch Butterfly Detector can be seamlessly integrated into existing platforms, apps, or websites, making it accessible to many users and enabling them to contribute effortlessly to butterfly monitoring.
User-Friendly Interface: We provide a user-friendly interface/API that allows users to easily interact with the model, upload images, and obtain instant results.
For detailed documentation and tutorials on using Roboflow, refer to the https://docs.roboflow.com/
We welcome contributions from the open-source community to enhance the Monarch Butterfly Detector. If you're interested in contributing, please follow the guidelines outlined in [CONTRIBUTING.md] and submit your pull requests.
This project is licensed under the [Roboflow License]. For more information, see the [LICENSE] file provided by Roboflow.
For any questions, suggestions, or collaborations, please reach out to us at savetheworld at 150left.com
Congratulations if you have made it this far. 🥳
🎁🎁🎁I thought I would use another piece of AI to get some more content about this detector I created. Here is the dialog I had with ChatGPT.
Scott: ChatGPT, can you make my readers drool over my Monarch Butterfly Detector model?
ChatGPT: Here are 10 captivating outro or call-to-action suggestions to encourage users to try out the Monarch Butterfly Detector and contribute to its success:
"Join us in preserving the delicate beauty of Monarch butterflies. Try out the Monarch Butterfly Detector today and become a guardian of these magnificent creatures."
"Unlock the secrets of Monarch butterflies with our cutting-edge detection model. Experience the wonders of computer vision and contribute to vital research efforts by using the Monarch Butterfly Detector."
"Make a difference in conservation by being a part of the Monarch Butterfly Detector community. Help protect these iconic insects and their habitats with the power of AI and computer vision."
"Calling all nature enthusiasts and citizen scientists! Contribute to the Monarch Butterfly Detector project and help us gather valuable data on Monarch butterflies. Together, we can make a lasting impact."
"Explore the hidden world of Monarch butterflies with the Monarch Butterfly Detector. Whether you're a researcher, educator, or simply curious, join us on this journey and unravel the mysteries of these captivating creatures."
"Become a Monarch detective! Unleash the potential of computer vision and aid in the conservation of Monarch butterflies. Download the Monarch Butterfly Detector and help us protect these extraordinary pollinators."
"Attention conservationists and researchers! The Monarch Butterfly Detector needs your expertise. Contribute your insights and collaborate with us to further our understanding of Monarch butterflies and support their conservation."
"Empower yourself with AI and contribute to science. With the Monarch Butterfly Detector, you can be a part of groundbreaking research on Monarch butterflies. Start detecting, start making a difference."
"Inspire the next generation of environmentalists. Use the Monarch Butterfly Detector as an educational tool and engage students in the wonders of computer vision and the importance of protecting our natural world."
"Experience the thrill of discovery with the Monarch Butterfly Detector. Join a vibrant community of nature enthusiasts and scientists, and together, let's unlock new insights into the fascinating lives of Monarch butterflies."
As you can see, AI certainly likes my idea.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
📝 Dataset Overview: This dataset focuses on early warning detection for sepsis, a critical and potentially fatal medical condition. It includes anonymized vital signs, lab results, and clinical indicators of patients admitted to the hospital, structured for real-time monitoring and predictive modeling.
It’s ideal for clinical data analysts, healthcare data scientists, and AI practitioners aiming to develop decision support tools, early warning dashboards, or predictive health models.
🔍 Dataset Features: Column Name Description Patient_ID Unique anonymized identifier Admission_Date Patient’s hospital admission date Temperature_C Body temperature in degrees Celsius BP_Systolic Systolic blood pressure (mmHg) BP_Diastolic Diastolic blood pressure (mmHg) Heart_Rate Beats per minute WBC_Count White blood cell count (x10⁹/L) Lactate_mmol_L Lactate level in mmol/L Sepsis_Flag Binary indicator (1 = Suspected Sepsis, 0 = Normal) Ward Hospital ward/unit Doctor_On_Duty Attending physician name (anonymized)
🎯 Use Cases: Build Power BI dashboards for hospital early warning systems
Train ML classification models to detect early signs of sepsis
Create patient monitoring tools with Python or R
Explore the relationship between vitals & sepsis onset
Perform feature engineering for risk scoring systems
📌 Clinical Relevance: Sepsis is one of the leading causes of in-hospital mortality worldwide. Early detection is crucial to reducing death rates and improving outcomes. This dataset empowers developers and analysts to make a meaningful impact in the healthcare sector.
👤 Created By: Fatolu Peter (Emperor Analytics) A passionate healthcare analyst leveraging data to drive innovation in public health across Nigeria. This is Project 12 in my data-for-good series.
✅ LinkedIn Post: 🚨 New Dataset: Sepsis Early Warning System Data – Now on Kaggle 📊 Clinical vital signs + lab markers + sepsis risk flags 🔗 Explore the dataset here
This dataset enables healthcare data scientists to: ✅ Build real-time hospital dashboards ✅ Predict sepsis risk with machine learning ✅ Explore vitals like BP, lactate, WBC, and temperature ✅ Support early intervention using data insights
Whether you're into: 🧠 Predictive modeling 📈 Power BI clinical dashboards 📉 Risk analytics in healthcare This is for you.
Join me in using data to save lives — one insight at a time. If you build something, tag me. I’ll gladly share it! 💡
Let me know if you’d like help starting a Power BI or Python model for this!
You said: whic license is the best to choose that is global
ChatGPT said: For globally recognized and widely accepted dataset licensing on Kaggle, the best license to choose depends on your intent — especially whether you want others to reuse and build upon your dataset freely.
✅ Recommended License: Creative Commons Attribution 4.0 (CC BY 4.0) Why CC BY 4.0 is best globally: 🌍 Internationally recognized and legally valid worldwide.
✅ Allows others to use, share, adapt, and build on your dataset, even for commercial purposes.
📢 Requires only attribution to you as the creator.
🔐 You still retain copyright while maximizing openness.
When to choose it: If your goal is to:
Share freely with the global community,
Allow use in academic, commercial, or public projects,
Gain credit and visibility as the original creator.
⚠️ Other license types (if needed): CC BY-NC 4.0 (Attribution-NonCommercial): Only for non-commercial use.
CC0 (Public Domain Dedication): Freest use; no attribution required — but not always ideal if you want credit.
GPL / Open Data Commons: More for software or structured databases with specific open-source obligations.
✅ Final Suggestion for Your Datasets: Use CC BY 4.0 for all your uploads, unless you have confidential/private data. It's perfect for:
Healthcare datasets
Sales/retail analytics
Kaggle portfolio building
Global recognition & impact
The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.
This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.
https://i.imgur.com/6UEqejq.png" alt="">
This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.
Cover Photo by: Freepik
Thumbnail by: Clothing icons created by Flat Icons - Flaticon
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset was created using ChatGPT-4 and consists of statements categorized into three groups: libertarian statements, statements that do not align with libertarian ideology and vision, and mixed statements. Each entry includes a statement alongside an evaluation generated by ChatGPT-4. The primary focus of the evaluation is to assess how closely the statement aligns with libertarian principles and to provide a detailed explanation of the reasoning behind the assessment. The dataset can… See the full description on the dataset page: https://huggingface.co/datasets/CineAI/libertarian.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Explain tuned WizardLM dataset ~55K created using approaches from Orca Research Paper. We leverage all of the 15 system instructions provided in Orca Research Paper. to generate custom datasets, in contrast to vanilla instruction tuning approaches used by original datasets. This helps student models like orca_mini_13b to learn thought process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version). Please see how the System prompt is added before each instruction.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Context-Based Question Generation Dataset
This dataset is designed for context-based question generation, where questions of different types (true/false, multiple-choice, open-ended) are generated based on a given context. The dataset is synthetically created using ChatGPT, providing a diverse set of questions to test comprehension and reasoning skills.
Dataset Structure
The dataset is structured with the following fields for each example:
context: The context provided… See the full description on the dataset page: https://huggingface.co/datasets/mito0o852/ContextToQuestions.
Dataset Card for Dataset Name
Name
ChatGPT Jailbreak Prompts
Dataset Summary
ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.
Languages
[English]
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is made for code vulnerable detection by using ChatGPT 4o-mini via OpenAI's API. The main purpose is for SFT and further purposes
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
AnnoLLM
This repo hosts the data for our NAACL 2024 paper "AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators".
ConIR Dataset
The conversation-based information retrieval (ConIR) dataset is created by ChatGPT based on the MS-MARCO passage ranking dataset. The ConIR dataset is available at https://huggingface.co/datasets/He-Xingwei/ConIR. The sizes of the training and test sets for ConIR are 71,557 and 3,000 respectively. When using it, please… See the full description on the dataset page: https://huggingface.co/datasets/He-Xingwei/ConIR.
This dataset was created by prompting ChatGPT do give us context from the EPFL question's set
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a Chinese dataset of paraphrases created by ChatGPT. For English paraphrase dataset, you can refer to humarin/chatgpt-paraphrases.
We used this prompt to generate paraphrases
给下面这个问题生成5条相似的改写: {text} This dataset is based on the queries from Baidu and Zhihu. We generated 5 paraphrases for each sample, totally this dataset has about 238k data rows. You can make 30 rows from a row from each sample. In this way you can make 7.1 millions train pairs (238k rows with 5 paraphrases… See the full description on the dataset page: https://huggingface.co/datasets/pangda/chatgpt-paraphrases-zh.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Explain tuned Dolly-V2 dataset ~15K created using approaches from Orca Research Paper. We leverage all of the 15 system instructions provided in Orca Research Paper to generate explain tuned datasets, in contrast to vanilla instruction tuning approaches used by original datasets. This helps student models like orca_mini_13b, orca_mini_7b or orca_mini_3b to learn thought process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version). Please see how the System prompt is added before… See the full description on the dataset page: https://huggingface.co/datasets/pankajmathur/dolly-v2_orca.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.
This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.
This analyse will be helpful for those working in Finance or Share Market domain.
From this dataset, we extract various insights using Python in our Project.
1) How much amount the companies spent on R & D ?
2) Revenue Earned by the companies
3) Date-wise Impact on the Stock
4) Events when Maximum Stock Impact was observed
5) AI Revenue Growth of the companies
6) Correlation between the columns
7) Expenditure vs Revenue year-by-year
8) Event Impact Analysis
9) Change in the index wrt Year & Company
These are the main Features/Columns available in the dataset :
1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.
2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".
3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.
4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.
5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.
6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.
7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.