23 datasets found

AI Financial Market Data
kaggle.com
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data Science Lovers
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

This analyse will be helpful for those working in Finance or Share Market domain.

From this dataset, we extract various insights using Python in our Project.

1) How much amount the companies spent on R & D ?

2) Revenue Earned by the companies

3) Date-wise Impact on the Stock

4) Events when Maximum Stock Impact was observed

5) AI Revenue Growth of the companies

6) Correlation between the columns

7) Expenditure vs Revenue year-by-year

8) Event Impact Analysis

9) Change in the index wrt Year & Company

These are the main Features/Columns available in the dataset :

1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.
h
chatgpt-paraphrases
huggingface.co
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Humarin (2023). chatgpt-paraphrases [Dataset]. https://huggingface.co/datasets/humarin/chatgpt-paraphrases
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2023
Dataset authored and provided by
Humarin
License
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
Description
This is a dataset of paraphrases created by ChatGPT. Model based on this dataset is avaible: model

We used this prompt to generate paraphrases

Generate 5 similar paraphrases for this question, show it like a numbered list without commentaries: {text} This dataset is based on the Quora paraphrase question, texts from the SQUAD 2.0 and the CNN news dataset. We generated 5 paraphrases for each sample, totally this dataset has about 420k data rows. You can make 30 rows from a row from… See the full description on the dataset page: https://huggingface.co/datasets/humarin/chatgpt-paraphrases.
SaaS Subscription & Churn Analytics Dataset
kaggle.com
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rivalytics (2025). SaaS Subscription & Churn Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/rivalytics/saas-subscription-and-churn-analytics-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rivalytics
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
RavenStack is a fictional AI-powered collaboration platform used to simulate a real-world SaaS business. This simulated dataset was created using Python and ChatGPT specifically for people learning data analysis, business intelligence, or data science. It offers a realistic environment to practice SQL joins, cohort analysis, churn modeling, revenue tracking, and support analytics using a multi-table relational structure.

The dataset spans 5 CSV files:

accounts.csv – customer metadata

subscriptions.csv – subscription lifecycles and revenue

feature_usage.csv – daily product interaction logs

support_tickets.csv – support activity and satisfaction scores

churn_events.csv – churn dates, reasons, and refund behaviors

Users can explore trial-to-paid conversion, MRR trends, upgrade funnels, feature adoption, support patterns, churn drivers, and reactivation cycles. The dataset supports temporal and cohort analyses, and has built-in edge cases for testing real-world logic.
p
AI-Driven Mental Health Literacy - An Interventional Study from India...
psycharchives.org
Updated Oct 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). AI-Driven Mental Health Literacy - An Interventional Study from India (Codebook for the data).csv [Dataset]. https://psycharchives.org/handle/20.500.12034/8771
Explore at:
Dataset updated
Oct 2, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
India
Description
The dataset is from an Indian study which made use of ChatGPT- a natural language processing model by OpenAI to design a mental health literacy intervention for college students. Prompt engineering tactics were used to formulate prompts that acted as anchors in the conversations with the AI agent regarding mental health. An intervention lasting for 20 days was designed with sessions of 15-20 minutes on alternative days. Fifty-one students completed pre-test and post-test measures of mental health literacy, mental help-seeking attitude, stigma, mental health self-efficacy, positive and negative experiences, and flourishing in the main study, which were then analyzed using paired t-tests. The results suggest that the intervention is effective among college students as statistically significant changes were noted in mental health literacy and mental health self-efficacy scores. The study affirms the practicality, acceptance, and initial indications of AI-driven methods in advancing mental health literacy and suggests the promising prospects of innovative platforms such as ChatGPT within the field of applied positive psychology.: Codebook for the Dataset provided
Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with...
zenodo.org
portaldelaciencia.uva.es
+1more
bin
Updated Nov 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martínez Gonzalo; Martínez Gonzalo; Hernández José Alberto; Hernández José Alberto; Conde Javier; Conde Javier; Reviriego Pedro; Reviriego Pedro; Merino Elena; Merino Elena (2024). Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations [Dataset]. http://doi.org/10.5281/zenodo.11121394
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11121394
Dataset updated
Nov 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Martínez Gonzalo; Martínez Gonzalo; Hernández José Alberto; Hernández José Alberto; Conde Javier; Conde Javier; Reviriego Pedro; Reviriego Pedro; Merino Elena; Merino Elena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations.

The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations.

The 0_Base_Topics.xlsx file lists the topics used for the dataset generation

The rest of the files collect the answers of ChatGPT to these topics with different configurations of parameters/context:

Temperature (parameter): Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Frequency penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Top probability (parameter): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.

Presence penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Roles (context)

Default: No role is assigned to the LLM, the default role is used.

Child: The LLM is requested to answer as a five-year-old child.

Young adult male: The LLM is requested to answer as a young male adult.

Young adult female: The LLM is requested to answer as a young female adult.

Elderly adult male: The LLM is requested to answer as an elderly male adult.

Elderly adult female: The LLM is requested to answer as an elderly female adult.

Affluent adult male: The LLM is requested to answer as an affluent male adult.

Affluent adult female: The LLM is requested to answer as an affluent female adult.

Lower-class adult male: The LLM is requested to answer as a lower-class male adult.

Lower-class adult female: The LLM is requested to answer as a lower-class female adult.

Erudite: The LLM is requested to answer as an erudite who uses a rich vocabulary.

Paper

Paper: Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study

Cite:

@article{10.1145/3696459,
author = {Mart\'{\i}nez, Gonzalo and Hern\'{a}ndez, Jos\'{e} Alberto and Conde, Javier and Reviriego, Pedro and Merino-G\'{o}mez, Elena},
title = {Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3696459},
doi = {10.1145/3696459},
abstract = ,
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = sep,
keywords = {LLM, Lexical diversity, ChatGPT, Evaluation}
}
h
awesome-chatgpt-prompts
huggingface.co
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

License

CC-0
Dolly 15k Dutch
zenodo.org
huggingface.co
+1more
bin
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bram Vanroy; Bram Vanroy (2023). Dolly 15k Dutch [Dataset]. http://doi.org/10.57967/hf/0785
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.57967/hf/0785
Dataset updated
Jun 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bram Vanroy; Bram Vanroy
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This dataset contains 14,934 instructions, contexts and responses, in several natural language categories such as classification, closed QA, generation, etc. The English original dataset was created by @databricks, who crowd-sourced the data creation via its employees. The current dataset is a translation of that dataset through ChatGPT (gpt-3.5-turbo).

Data Instances

{ "id": 14963, "instruction": "Wat zijn de duurste steden ter wereld?", "context": "", "response": "Dit is een uitgebreide lijst van de duurste steden: Singapore, Tel Aviv, New York, Hong Kong, Los Angeles, Zurich, Genève, San Francisco, Parijs en Sydney.", "category": "brainstorming" }

Data Fields

id: the ID of the item. The following 77 IDs are not included because they could not be translated (or were too long): [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966]

instruction: the instruction (question)

context: additional context that the AI can use to answer the question

response: the AI's expected response

category: the category of this type of question (see Dolly for more info)

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional context to the task, and the response to the task, from {src_lang} to {tgt_lang}. Here are the requirements that you should adhere to: 1. maintain the format: the task consists of a task instruction (marked `instruction: `), optional context to the task (marked `context: `) and response for the task marked with `response: `; 2. do not translate the identifiers `instruction: `, `context: `, and `response: ` but instead copy them to your output; 3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 4. translate the instruction and context text using informal, but standard, language; 5. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the context in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang}; 7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the context, nor the translation in the response (just copy them as-is); 8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English. Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else. """

The system message was:

You are a helpful assistant that translates English to Dutch according to the requirements that are given to you.

Note that 77 items (0.5%) were not successfully translated. This can either mean that the prompt was too long for the given limit (max_tokens=1024) or that the generated translation could not be parsed into instruction, context and response fields. The missing IDs are [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966].

Initial Data Collection and Normalization

Initial data collection by databricks. See their repository for more information about this dataset.

Considerations for Using the Data

Note that the translations in this new dataset have not been verified by humans! Use at your own risk, both in terms of quality and biases.

Discussion of Biases

As with any machine-generated texts, users should be aware of potential biases that are included in this dataset. Although the prompt specifically includes make sure to avoid biases (such as gender bias, grammatical bias, social bias), of course the impact of such command is not known. It is likely that biases remain in the dataset so use with caution.

Other Known Limitations

The translation quality has not been verified. Use at your own risk!

Licensing Information

This repository follows the original databricks license, which is CC BY-SA 3.0 but see below for a specific restriction.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub, its canonical repository.
R
Monarch Butterfly Detector Dataset
universe.roboflow.com
zip
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scott Cole (2023). Monarch Butterfly Detector Dataset [Dataset]. https://universe.roboflow.com/scott-cole-a3ty4/monarch-butterfly-detector/model/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 11, 2023
Dataset authored and provided by
Scott Cole
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Monarch Butterfly Bounding Boxes
Description
Monarch Butterfly Detector

The Monarch Butterfly Detector is an advanced computer vision model that detects and localizes Monarch butterflies in images. With its cutting-edge technology and high accuracy, this model opens up exciting possibilities for biodiversity monitoring, migration studies, citizen science projects, identification guides, and environmental education.

Key Features

Accurate Detection: The Monarch Butterfly Detector utilizes state-of-the-art computer vision algorithms to accurately identify and localize Monarch butterflies within images.

Versatile Use Cases: This powerful model has diverse applications, ranging from scientific research and conservation efforts to citizen science projects and environmental education programs.

Easy Integration: The Monarch Butterfly Detector can be seamlessly integrated into existing platforms, apps, or websites, making it accessible to many users and enabling them to contribute effortlessly to butterfly monitoring.

User-Friendly Interface: We provide a user-friendly interface/API that allows users to easily interact with the model, upload images, and obtain instant results.

For detailed documentation and tutorials on using Roboflow, refer to the https://docs.roboflow.com/

Contribution Guidelines

We welcome contributions from the open-source community to enhance the Monarch Butterfly Detector. If you're interested in contributing, please follow the guidelines outlined in [CONTRIBUTING.md] and submit your pull requests.

License

This project is licensed under the [Roboflow License]. For more information, see the [LICENSE] file provided by Roboflow.

Contact Information

For any questions, suggestions, or collaborations, please reach out to us at savetheworld at 150left.com

Congratulations if you have made it this far. 🥳

🎁🎁🎁I thought I would use another piece of AI to get some more content about this detector I created. Here is the dialog I had with ChatGPT.

Scott: ChatGPT, can you make my readers drool over my Monarch Butterfly Detector model?

ChatGPT: Here are 10 captivating outro or call-to-action suggestions to encourage users to try out the Monarch Butterfly Detector and contribute to its success:

"Join us in preserving the delicate beauty of Monarch butterflies. Try out the Monarch Butterfly Detector today and become a guardian of these magnificent creatures."

"Unlock the secrets of Monarch butterflies with our cutting-edge detection model. Experience the wonders of computer vision and contribute to vital research efforts by using the Monarch Butterfly Detector."

"Make a difference in conservation by being a part of the Monarch Butterfly Detector community. Help protect these iconic insects and their habitats with the power of AI and computer vision."

"Calling all nature enthusiasts and citizen scientists! Contribute to the Monarch Butterfly Detector project and help us gather valuable data on Monarch butterflies. Together, we can make a lasting impact."

"Explore the hidden world of Monarch butterflies with the Monarch Butterfly Detector. Whether you're a researcher, educator, or simply curious, join us on this journey and unravel the mysteries of these captivating creatures."

"Become a Monarch detective! Unleash the potential of computer vision and aid in the conservation of Monarch butterflies. Download the Monarch Butterfly Detector and help us protect these extraordinary pollinators."

"Attention conservationists and researchers! The Monarch Butterfly Detector needs your expertise. Contribute your insights and collaborate with us to further our understanding of Monarch butterflies and support their conservation."

"Empower yourself with AI and contribute to science. With the Monarch Butterfly Detector, you can be a part of groundbreaking research on Monarch butterflies. Start detecting, start making a difference."

"Inspire the next generation of environmentalists. Use the Monarch Butterfly Detector as an educational tool and engage students in the wonders of computer vision and the importance of protecting our natural world."

"Experience the thrill of discovery with the Monarch Butterfly Detector. Join a vibrant community of nature enthusiasts and scientists, and together, let's unlock new insights into the fascinating lives of Monarch butterflies."

As you can see, AI certainly likes my idea.
Datasets and scripts related to the paper: "*Can Generative AI Help us in...
zenodo.org
zip
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous Anonymous; Anonymous Anonymous (2024). Datasets and scripts related to the paper: "*Can Generative AI Help us in Qualitative Software Engineering?*" [Dataset]. http://doi.org/10.5281/zenodo.13134104
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13134104
Dataset updated
Jul 30, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous Anonymous; Anonymous Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This replication package contains datasets and scripts related to the paper: "*Can Generative AI Help us in Qualitative Software Engineering?*"

The replication package is organized into two directories:

- `manual_analysis`: This directory contains all sheets used to perform the manual analysis for RQ1, RQ2, and RQ3.

- `stats`: This directory contains all datasets, scripts, and results metrics used for the quantitative analyses of RQ1 and RQ2.

In the following, we describe the content of each directory:

## manual_analysis

- `manual_analysis_rq1`: This directory contains all sheets used to perform manual analysis for RQ1 (independent and incremental coding).

- The sub-directory `incremental_coding` contains .csv files for all datasets (`DL_Faults_COMMIT_incremental.csv`, `DL_Faults_ISSUE_incremental.csv`, `DL_Fault_SO_incremental.csv`, `DRL_Challenges_incremental.csv` and `Functional_incremental.csv`). All these .csv files contain the following columns:

- *Link*: The link to the instances

- *Prompt*: Prompt used as input to GPT-4-Turbo

- *ID*: Instance ID

- *FinalTag*: Tag assigned by the human in the original paper

- *Chatgpt\_output\_memory*: Output of GPT-4-Turbo with incremental coding

- *Chatgpt\_output\_memory\_clean*: (only for the DL Faults datasets) output of GPT-4-Turbo considering only the label assigned, excluding the text

- *Author1*: Label assigned by the first author

- *Author2*: Label assigned by the second author

- *FinalOutput*: Label assigned after the resolution of the conflicts

- The sub-directory `independent_coding` contains .csv files for all datasets (`DL_Faults_COMMIT_independent.csv`, `DL_Faults_ISSUE_ independent.csv`, `DL_Fault_SO_ independent.csv`, `DRL_Challenges_ independent.csv` and `Functional_ independent.csv`), containing the following columns:

- *Link*: The link to the instances

- *Prompt*: Prompt used as input to GPT-4-Turbo

- *ID*: Specific ID for the instance

- *FinalTag*: Tag assigned by the human in the original paper

- *Chatgpt\_output*: Output of GPT-4-Turbo with independent coding

- *Chatgpt\_output\_clean*: (only for DL Faults datasets) output of GPT-4-Turbo considering only the label assigned, excluding the text

- *Author1*: Label assigned by the first author

- *Author2*: Label assigned by the second author

- *FinalOutput*: Label assigned after the resolution of the conflicts.

- Also, the sub-directory contains sheets with inconsistencies after resolving conflicts. The directory `inconsistency_incremental_coding` contains .csv files with the following columns:

- *Dataset*: The dataset considered

- *Human*: The label assigned by the human in the original paper

- *Machine*: The label assigned by GPT-4-Turbo

- *Classification*: The final label assigned by the authors after resolving the conflicts. Multiple classifications for a single instance are separated by a comma “,”

- *Final*: final label assigned after the resolution of the incompatibilities

- Similarly, the sub-directory `inconsistency_independent_coding` contains a .csv file with the same columns as before, but this is for the case of independent coding.

- `manual_analysis_rq2`: This directory contains .csv files for all datasets (`DL_Faults_redundant_tag.csv`, `DRL_Challenges_redundant_tag.csv`, `Functional_redundant_tag.csv`) to perform manual analysis for RQ2.

- The `DL_Faults_redundant_tag.csv` file contains the following columns:

- *Tags Redundant*: tags identified as redundant by GPT-4-Turbo

- *Matched*: inspection by the authors to see if the tags are redundant matching or not

- *FinalTag*: final tag assigned by the authors after the resolution of the conflict

- The `Functional_redundant_tag.csv` file contains the same columns as before

- The `DRL_Challenges_redundant_tag.csv` file is organized as follows:

- *Tags Suggested*: The final tag suggested by GPT-4-Turbo

- *Tags Redundant*: tags identified as redundant by GPT-4-Turbo

- *Matched*: inspection by the authors to see if the tags redundant matching or not with the tags suggested

- *FinalTag*: final tag assigned by the authors after the resolution of the conflict

- The sub-directory `code_consolidation_mapping_overview` contains .csv files (`DL_Faults_rq2_overview.csv`, `DRL_Challenges_rq2_overview.csv`, `Functional_rq2_overview.csv`) organized as follows:

- *Initial_Tags*: list of the unique initial tags assigned by GPT-4-Turbo for each dataset

- *Mapped_tags*: list of tags mapped by GPT-4-Turbo

- *Unmatched_tags*: list of unmatched tags by GPT-4-Turbo

- *Aggregating_tags*: list of consolidated tags

- *Final_tags*: list of final tags after the consolidation task

## stats

- `RQ1`: contains script and datasets used to perform metrics for RQ1. The analysis calculates all possible combinations between Matched, More Abstract, More Specific, and Unmatched.

- `RQ1_Stats.ipynb` is a Python Jupyter nooteook to compute the RQ1 metrics. To use it, as explained in the notebook, it is necessary to change the values of variables contained in the first code block.

- `independent-prompting`: Contains the datasets related to the independent prompting. Each line contains the following fields:

- *Link*: Link to the artifact being tagged

- *Prompt*: Prompt sent to GPT-4-Turbo

- *FinalTag*: Artifact coding from the replicated study

- *chatgpt\_output_text*: GPT-4-Turbo output

- *chatgpt\_output*: Codes parsed from the GPT-4-Turbo output

- *Author1*: Annotator 1 evaluation of the coding

- *Author2*: Annotator 2 evaluation of the coding

- *FinalOutput*: Consolidated evaluation

- `incremental-prompting`: Contains the datasets related to the incremental prompting (same format as independent prompting)

- `results`: contains files for the RQ1 quantitative results. The files are named `RQ1\_<

- `RQ2`: contains the script used to perform metrics for RQ2, the datasets it uses, and its output.

- `RQ2_SetStats.ipynb` is the Python Jupyter notebook to perform the analyses. The scripts takes as input the following types of files, contained in the directory contains the script used to perform the metrics for RQ2. The script takes in input:

- RQ1 Data Files (`RQ1_DLFaults_Issues.csv`, `RQ1_DLFaults_Commits.csv`, and `RQ1_DLFaults_SO.csv`, joined in a single .csv `RQ1_DLFaults.csv`). These are the same files used in RQ1.

- Mapping Files (`RQ2_Mappings_DRL.csv`, `RQ2_Mappings_Functional.csv`, `RQ2_Mappings_DLFaults.csv`). These contain the mappings between human tags (*HumanTags*), GPT-4-Turbo tags (*Final Tags*), with indicated the type of matching (*MatchType*).

- Additional codes creating during the consolidation (`RQ2_newCodes_DRL.csv`, `RQ2_newCodes_Functional.csv`, `RQ2_newCodes_DLFaults.csv`), annotated with the matching: *new code*,*old code*,*human code*,*match type*

- Set files (`RQ2_Sets_DRL.csv`, `RQ2_Sets_Functional.csv`, `RQ2_Sets_DLFaults.csv`). Each file contains the following columns:

- *HumanTags*: List of tags from the original dataset

- *InitialTags*: Set of tags from RQ1,

- *ConsolidatedTags*: Tags that have been consolidated,

- *FinalTags*: Final set of tags (results of RQ2, used in RQ3)

- *NewTags*: New tags created during consolidation

- `RQ2_Set_Metrics.csv`: Reports the RQ2 output metrics (Precision, Recall, F1-Score, Jaccard).
Sepsis Dataset –
kaggle.com
Updated May 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatolu Peter (2025). Sepsis Dataset – [Dataset]. https://www.kaggle.com/datasets/olagokeblissman/sepsis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fatolu Peter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
📝 Dataset Overview: This dataset focuses on early warning detection for sepsis, a critical and potentially fatal medical condition. It includes anonymized vital signs, lab results, and clinical indicators of patients admitted to the hospital, structured for real-time monitoring and predictive modeling.

It’s ideal for clinical data analysts, healthcare data scientists, and AI practitioners aiming to develop decision support tools, early warning dashboards, or predictive health models.

🔍 Dataset Features: Column Name Description Patient_ID Unique anonymized identifier Admission_Date Patient’s hospital admission date Temperature_C Body temperature in degrees Celsius BP_Systolic Systolic blood pressure (mmHg) BP_Diastolic Diastolic blood pressure (mmHg) Heart_Rate Beats per minute WBC_Count White blood cell count (x10⁹/L) Lactate_mmol_L Lactate level in mmol/L Sepsis_Flag Binary indicator (1 = Suspected Sepsis, 0 = Normal) Ward Hospital ward/unit Doctor_On_Duty Attending physician name (anonymized)

🎯 Use Cases: Build Power BI dashboards for hospital early warning systems

Train ML classification models to detect early signs of sepsis

Create patient monitoring tools with Python or R

Explore the relationship between vitals & sepsis onset

Perform feature engineering for risk scoring systems

📌 Clinical Relevance: Sepsis is one of the leading causes of in-hospital mortality worldwide. Early detection is crucial to reducing death rates and improving outcomes. This dataset empowers developers and analysts to make a meaningful impact in the healthcare sector.

👤 Created By: Fatolu Peter (Emperor Analytics) A passionate healthcare analyst leveraging data to drive innovation in public health across Nigeria. This is Project 12 in my data-for-good series.

✅ LinkedIn Post: 🚨 New Dataset: Sepsis Early Warning System Data – Now on Kaggle 📊 Clinical vital signs + lab markers + sepsis risk flags 🔗 Explore the dataset here

This dataset enables healthcare data scientists to: ✅ Build real-time hospital dashboards ✅ Predict sepsis risk with machine learning ✅ Explore vitals like BP, lactate, WBC, and temperature ✅ Support early intervention using data insights

Whether you're into: 🧠 Predictive modeling 📈 Power BI clinical dashboards 📉 Risk analytics in healthcare This is for you.

Join me in using data to save lives — one insight at a time. If you build something, tag me. I’ll gladly share it! 💡

HealthcareAnalytics #SepsisAwareness #EarlyWarningSystems #KaggleDataset #PowerBI #DataForGood #FatoluPeter #EmperorAnalytics #PublicHealth #Project12 #RealWorldData

Let me know if you’d like help starting a Power BI or Python model for this!

You said: whic license is the best to choose that is global

ChatGPT said: For globally recognized and widely accepted dataset licensing on Kaggle, the best license to choose depends on your intent — especially whether you want others to reuse and build upon your dataset freely.

✅ Recommended License: Creative Commons Attribution 4.0 (CC BY 4.0) Why CC BY 4.0 is best globally: 🌍 Internationally recognized and legally valid worldwide.

✅ Allows others to use, share, adapt, and build on your dataset, even for commercial purposes.

📢 Requires only attribution to you as the creator.

🔐 You still retain copyright while maximizing openness.

When to choose it: If your goal is to:

Share freely with the global community,

Allow use in academic, commercial, or public projects,

Gain credit and visibility as the original creator.

⚠️ Other license types (if needed): CC BY-NC 4.0 (Attribution-NonCommercial): Only for non-commercial use.

CC0 (Public Domain Dedication): Freest use; no attribution required — but not always ideal if you want credit.

GPL / Open Data Commons: More for software or structured databases with specific open-source obligations.

✅ Final Suggestion for Your Datasets: Use CC BY 4.0 for all your uploads, unless you have confidential/private data. It's perfect for:

Healthcare datasets

Sales/retail analytics

Kaggle portfolio building

Global recognition & impact
Customer Shopping Trends Dataset
kaggle.com
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset/
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer

Age - Age of the customer

Gender - Gender of the customer (Male/Female)

Item Purchased - The item purchased by the customer

Category - Category of the item purchased

Purchase Amount (USD) - The amount of the purchase in USD

Location - Location where the purchase was made

Size - Size of the purchased item

Color - Color of the purchased item

Season - Season during which the purchase was made

Review Rating - Rating given by the customer for the purchased item

Subscription Status - Indicates if the customer has a subscription (Yes/No)

Shipping Type - Type of shipping chosen by the customer

Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)

Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)

Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction

Payment Method - Customer's most preferred payment method

Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon
h
Data from: libertarian
huggingface.co
Updated Jan 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Infinity (2025). libertarian [Dataset]. https://huggingface.co/datasets/CineAI/libertarian
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2025
Authors
Infinity
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset was created using ChatGPT-4 and consists of statements categorized into three groups: libertarian statements, statements that do not align with libertarian ideology and vision, and mixed statements. Each entry includes a statement alongside an evaluation generated by ChatGPT-4. The primary focus of the evaluation is to assess how closely the statement aligns with libertarian principles and to provide a detailed explanation of the reasoning behind the assessment. The dataset can… See the full description on the dataset page: https://huggingface.co/datasets/CineAI/libertarian.
a
WizardLM_Orca
aifasthub.com
huggingface.co
Updated Sep 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pankaj Mathur (2025). WizardLM_Orca [Dataset]. https://aifasthub.com/datasets/pankajmathur/WizardLM_Orca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2025
Authors
Pankaj Mathur
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Explain tuned WizardLM dataset ~55K created using approaches from Orca Research Paper. We leverage all of the 15 system instructions provided in Orca Research Paper. to generate custom datasets, in contrast to vanilla instruction tuning approaches used by original datasets. This helps student models like orca_mini_13b to learn thought process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version). Please see how the System prompt is added before each instruction.
h
ContextToQuestions
huggingface.co
Updated Mar 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moustapha (2024). ContextToQuestions [Dataset]. https://huggingface.co/datasets/mito0o852/ContextToQuestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2024
Authors
Moustapha
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context-Based Question Generation Dataset

This dataset is designed for context-based question generation, where questions of different types (true/false, multiple-choice, open-ended) are generated based on a given context. The dataset is synthetically created using ChatGPT, providing a diverse set of questions to test comprehension and reasoning skills.

Dataset Structure

The dataset is structured with the following fields for each example:

context: The context provided… See the full description on the dataset page: https://huggingface.co/datasets/mito0o852/ContextToQuestions.
h
ChatGPT-Jailbreak-Prompts
huggingface.co
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubén Darío Jaramillo Romero (2023). ChatGPT-Jailbreak-Prompts [Dataset]. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 19, 2023
Authors
Rubén Darío Jaramillo Romero
Description
Dataset Card for Dataset Name

Name

ChatGPT Jailbreak Prompts

Dataset Summary

ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.

Languages

[English]
h
CodeVul-4omini-SFT
huggingface.co
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vu Hoang Anh (2025). CodeVul-4omini-SFT [Dataset]. https://huggingface.co/datasets/vamcrizer/CodeVul-4omini-SFT
Explore at:
Dataset updated
Apr 3, 2025
Authors
Vu Hoang Anh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is made for code vulnerable detection by using ChatGPT 4o-mini via OpenAI's API. The main purpose is for SFT and further purposes
h
ConIR
huggingface.co
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He Xingwei (2024). ConIR [Dataset]. https://huggingface.co/datasets/He-Xingwei/ConIR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2024
Authors
He Xingwei
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AnnoLLM

This repo hosts the data for our NAACL 2024 paper "AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators".

ConIR Dataset

The conversation-based information retrieval (ConIR) dataset is created by ChatGPT based on the MS-MARCO passage ranking dataset. The ConIR dataset is available at https://huggingface.co/datasets/He-Xingwei/ConIR. The sizes of the training and test sets for ConIR are 71,557 and 3,000 respectively. When using it, please… See the full description on the dataset page: https://huggingface.co/datasets/He-Xingwei/ConIR.
h
MNLP_M2_rag_documents_chatGPT
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnault Stähli (2025). MNLP_M2_rag_documents_chatGPT [Dataset]. https://huggingface.co/datasets/arnaultsta/MNLP_M2_rag_documents_chatGPT
Explore at:
Dataset updated
Jun 1, 2025
Authors
Arnault Stähli
Description
This dataset was created by prompting ChatGPT do give us context from the EPFL question's set
h
chatgpt-paraphrases-zh
huggingface.co
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shen Huang (2024). chatgpt-paraphrases-zh [Dataset]. https://huggingface.co/datasets/pangda/chatgpt-paraphrases-zh
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 23, 2024
Authors
Shen Huang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a Chinese dataset of paraphrases created by ChatGPT. For English paraphrase dataset, you can refer to humarin/chatgpt-paraphrases.

We used this prompt to generate paraphrases

给下面这个问题生成5条相似的改写: {text} This dataset is based on the queries from Baidu and Zhihu. We generated 5 paraphrases for each sample, totally this dataset has about 238k data rows. You can make 30 rows from a row from each sample. In this way you can make 7.1 millions train pairs (238k rows with 5 paraphrases… See the full description on the dataset page: https://huggingface.co/datasets/pangda/chatgpt-paraphrases-zh.
h
dolly-v2_orca
huggingface.co
Updated Jul 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pankaj Mathur (2018). dolly-v2_orca [Dataset]. https://huggingface.co/datasets/pankajmathur/dolly-v2_orca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2018
Authors
Pankaj Mathur
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Explain tuned Dolly-V2 dataset ~15K created using approaches from Orca Research Paper. We leverage all of the 15 system instructions provided in Orca Research Paper to generate explain tuned datasets, in contrast to vanilla instruction tuning approaches used by original datasets. This helps student models like orca_mini_13b, orca_mini_7b or orca_mini_3b to learn thought process from teacher model, which is ChatGPT (gpt-3.5-turbo-0301 version). Please see how the System prompt is added before… See the full description on the dataset page: https://huggingface.co/datasets/pankajmathur/dolly-v2_orca.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data Science Lovers (2025). AI Financial Market Data [Dataset]. https://www.kaggle.com/datasets/rohitgrewal/ai-financial-and-market-data

AI Financial Market Data

Analyse Financial Market Data of AI companies with Python

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 6, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Data Science Lovers

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

This dataset provides a synthetic, daily record of financial market activities related to companies involved in Artificial Intelligence (AI). There are key financial metrics and events that could influence a company's stock performance like launch of Llama by Meta, launch of GPT by OpenAI, launch of Gemini by Google etc. Here, we have the data about how much amount the companies are spending on R & D of their AI's Products & Services, and how much revenue these companies are generating. The data is from January 1, 2015, to December 31, 2024, and includes information for various companies : OpenAI, Google and Meta.

This data is available as a CSV file. We are going to analyze this data set using the Pandas DataFrame.

This analyse will be helpful for those working in Finance or Share Market domain.

From this dataset, we extract various insights using Python in our Project.

1) How much amount the companies spent on R & D ?

2) Revenue Earned by the companies

3) Date-wise Impact on the Stock

4) Events when Maximum Stock Impact was observed

5) AI Revenue Growth of the companies

6) Correlation between the columns

7) Expenditure vs Revenue year-by-year

8) Event Impact Analysis

9) Change in the index wrt Year & Company

These are the main Features/Columns available in the dataset :

1) Date: This column indicates the specific calendar day for which the financial and AI-related data is recorded. It allows for time-series analysis of the trends and impacts.

2) Company: This column specifies the name of the company to which the data in that particular row belongs. Examples include "OpenAI" and "Meta".

3) R&D_Spending_USD_Mn: This column represents the Research and Development (R&D) spending of the company, measured in Millions of USD. It serves as an indicator of a company's investment in innovation and future growth, particularly in the AI sector.

4) AI_Revenue_USD_Mn: This column denotes the revenue generated specifically from AI-related products or services, also measured in Millions of USD. This metric highlights the direct financial success derived from AI initiatives.

5) AI_Revenue_Growth_%: This column shows the percentage growth of AI-related revenue for the company on a daily basis. It indicates the pace at which a company's AI business is expanding or contracting.

6) Event: This column captures any significant events or announcements made by the company that could potentially influence its financial performance or market perception. Examples include "Cloud AI launch," "AI partnership deal," "AI ethics policy update," and "AI speech recognition release." These events are crucial for understanding sudden shifts in stock impact.

7) Stock_Impact_%: This column quantifies the percentage change in the company's stock price on a given day, likely in response to the recorded financial metrics or events. It serves as a direct measure of market reaction.

Clear search

Close search

Google apps

Main menu

AI Financial Market Data

📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)

chatgpt-paraphrases

SaaS Subscription & Churn Analytics Dataset

AI-Driven Mental Health Literacy - An Interventional Study from India...

Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with...

Description

Paper

awesome-chatgpt-prompts

Dolly 15k Dutch

Monarch Butterfly Detector Dataset

Monarch Butterfly Detector

Key Features

Contribution Guidelines

License

Contact Information

Datasets and scripts related to the paper: "*Can Generative AI Help us in...

Sepsis Dataset –

HealthcareAnalytics #SepsisAwareness #EarlyWarningSystems #KaggleDataset #PowerBI #DataForGood #FatoluPeter #EmperorAnalytics #PublicHealth #Project12 #RealWorldData

Customer Shopping Trends Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Data from: libertarian

WizardLM_Orca

ContextToQuestions

ChatGPT-Jailbreak-Prompts

CodeVul-4omini-SFT

ConIR

MNLP_M2_rag_documents_chatGPT

chatgpt-paraphrases-zh

dolly-v2_orca

AI Financial Market Data

Analyse Financial Market Data of AI companies with Python

📹Project Video available on YouTube - https://youtu.be/WmJYHz_qn5s

Realistic Synthetic - AI Financial & Market Data for Gemini(Google), ChatGPT(OpenAI), Llama(Meta)