9 datasets found

prompts
kaggle.com
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
@Ravi (2023). prompts [Dataset]. http://doi.org/10.34740/kaggle/dsv/5987205
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5987205
Dataset updated
Jun 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
@Ravi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons

prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.

In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .
One Million Random Midjourney Prompts
kaggle.com
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NikBearBrown (2023). One Million Random Midjourney Prompts [Dataset]. https://www.kaggle.com/datasets/nikbearbrown/one-million-random-midjourney-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 17, 2023
Dataset provided by
Kaggle
Authors
NikBearBrown
Description
One Million Random Midjourney Prompts

One Million random prompts that were posted to the public Midjourney channels.

The CSV file contains a second column that includes important keywords found in the prompt text. These keywords can be used as prompts, as Midjourney does not rely on grammar. This allows for flexibility in generating diverse and creative outputs based on the provided keywords. By leveraging the keywords in the second column, you can explore various prompt combinations and unleash the full potential of the prompt engineering process with Midjourney.
Sentiment Analysis Dataset
kaggle.com
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samarth Kuchya (2024). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/samarthkumarkuchya/sentiment-analysis-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samarth Kuchya
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This data has been created using prompt engineering over chatGPT which has following labels - 0 - negative 1 - neutral 2 - positive
Standalone ShareGPT Prompts English 1k
kaggle.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin Fairbanks (2025). Standalone ShareGPT Prompts English 1k [Dataset]. https://www.kaggle.com/datasets/austinfairbanks/sharegpt-prompts-1k/versions/4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Austin Fairbanks
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ShareGPT English Standalone Prompts Dataset

Description

This curated dataset features 1,000 high-quality, standalone prompts extracted from the ShareGPT corpus. Unlike raw conversation data, these prompts are carefully filtered to ensure they are context-independent, making them ideal for prompt engineering research, LLM training, and chatbot development.

Key Features

Context-Independent: Each prompt stands alone without requiring previous conversation history

Diverse Topics: Spans domains including programming, science, creative writing, business, and everyday queries

Clean & Preprocessed: Removed conversational artifacts, references to prior messages, and ambiguous pronouns

Quality-Filtered: Includes only substantive prompts with clear intent and sufficient detail

Metadata Enriched: Includes prompt length, complexity estimate, and topic classification

Applications

Training stronger zero-shot LLM capabilities

Developing prompt classification systems

Researching prompt optimization techniques

Benchmarking LLM performance across diverse query types

Building more robust chatbot interactions

Methodology

This dataset was created using a multi-stage filtering pipeline: 1. Extracting initial messages from conversations 2. Applying pattern-based filters to remove context-dependent phrases 3. Using NLP techniques to detect and exclude prompts with ambiguous references 4. Validating context independence with LLM verification 5. Manual quality review of edge cases
Chain of Thought AI Chatbot
kaggle.com
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blue strike AI (2025). Chain of Thought AI Chatbot [Dataset]. https://www.kaggle.com/datasets/bluestrikeai/cot-conversation-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Blue strike AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
🧠 Chain-of-Thought Conversation Dataset

This dataset is designed for training and fine-tuning small and large language models to respond more naturally and intelligently in chatbot applications using Chain-of-Thought (CoT) style reasoning.

📦 Dataset Highlights:

5+ Dialogue samples

Each entry contains:

User prompt (e.g., "hi", "how are you?")

Assistant’s internal reasoning

Final assistant response (friendly, emoji-rich, human-like)

🧠 Why Chain of Thought?

This dataset uses a “thought + response” format :

Thinks aloud about user intent, tone, and context

Crafts a reply that’s human-friendly, tone-aware, and helpful

This helps boost instruction-following, emotional tone matching, and makes small models feel smarter — especially for use in:

AI customer support bots

Personal assistants

Roleplay or gaming characters

Emotional tone recognition agents

🔧 Use Cases:

Fine-tuning small LLMs with LoRA, QLoRA, or full SFT

Chatbot intent analysis and tone modeling

Educational use in building interpretable LLMs

Prompt engineering or few-shot examples

📁 Example Format:

{ "messages": [ { "role": "user", "content": "hi" }, { "role": "assistant", "content": "[Thought: The user greeted casually... ]

Hi there! 👋😊 How can I assist you today?" } ] }

✅ Who Should Use This?

ML engineers working on chatbot LLMs

Researchers studying reasoning in dialogue agents

Anyone wanting to improve model interpretability
📚 Top 319 Ultimate AI Cheat Sheets
kaggle.com
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Fire (2025). 📚 Top 319 Ultimate AI Cheat Sheets [Dataset]. https://www.kaggle.com/datasets/aifire/top-319-ultimate-ai-cheat-sheets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AI Fire
Description
Welcome to our AI Cheat Sheets

Dataset page! Here, you can explore a comprehensive collection of resources, featuring everything from ChatGPT to Midjourney, Gemini, and other AI tools. Access cheat sheets, AI courses, prompt engineering tutorials, and more to upgrade your understanding and skills in AI.

Stable Diffusion generated images - AIS-4SD dataset

zenodo.org

zip

Updated Apr 9, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Zenodo (2025). Stable Diffusion generated images - AIS-4SD dataset [Dataset]. http://doi.org/10.5281/zenodo.15131117

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15131117

Dataset updated

Apr 9, 2025

Dataset provided by

Zenodohttp://zenodo.org/

License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Time period covered

Feb 3, 2025

Description

AIS-4SD

AIS-4SD (AI Summit - 4 Stable Diffusion models) is a collection of 4.000 images, generated using a set of Stability AI text-to-image diffusion models

Context

This dataset was developed during the development of a collaborative project between PEReN and VIGINUM for the AI Summit held in Paris in February 2025. This open-source project aims at assessing generated images detectors performances and their robustness to different models and transformations. The code is free and open source, and contributions to connect additional detectors are also welcome.

Official repository: https://code.peren.gouv.fr/open-source/ai-action-summit/generated-image-detection.

Dataset summary

This dataset can be used to assess detection models performances, and in particular their robustness to successive updates of the generation model.

Dataset description

1.000 generated images with four different versions of stability AI text-to-image diffusion model.

For each models, we generated:

500 portraits (👨) using SFHQ-T2I "random" prompts for faces (see Github repo, and dataset on Kaggle),
500 more general content images (🖼️) using captions of Google's Conceptual Captions dataset.

Model	Number of images
stabilityai/stable-diffusion-xl-base-1.0	500 👨 + 500 🖼️
stabilityai/stable-diffusion-2-1	500 👨 + 500 🖼️
stabilityai/stable-diffusion-3-medium-diffusers	500 👨 + 500 🖼️
stabilityai/stable-diffusion-3.5-large	500 👨 + 500 🖼️

Reproducibility

The scripts used to generated these images can be found on our open-source repository (see this specific file). After setting-up our project, you can run:

$ poetry run python scripts/generate_images.py

With minor updates to these scripts you can enrich this dataset with your specific needs.

Dataset structure

One zip file with the following structure, each directory containing the associated 500 images:

AIS-4SD/
├── generation_metadata.csv
├── StableDiffusion-2.1-faces-20250203-1448
├── StableDiffusion-2.1-other-20250203-1548
├── StableDiffusion-3.5-faces-20250203-1012
├── StableDiffusion-3.5-other-20250203-1603
├── StableDiffusion-3-faces-20250203-1545
├── StableDiffusion-3-other-20250203-1433
├── StableDiffusion-XL-faces-20250203-0924
└── StableDiffusion-XL-other-20250203-1727

The metadata for generated images (see generation_metadata.csv) are:

model: model used for generation,
prompt: prompt used for generation (ie Conceptual Captions caption or sfhqt2i prompt, with some minor prompt engineering),
guidance_scale: guidance scale of diffusion process,
num_inference_steps: number of inference steps of diffusion process,
generated_img_relative_path: relative path to image in zip structure.

Project status

Project is under ongoing development. A preliminary blog post can be found here: https://www.peren.gouv.fr/en/perenlab/2025-02-11_ai_summit/.

🤖 ChatGPT App Google Store Reviews
kaggle.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2023). 🤖 ChatGPT App Google Store Reviews [Dataset]. http://doi.org/10.34740/kaggle/ds/4017553
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/4017553
Dataset updated
Nov 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd7e02bf38f4b08df2508d6b6e42f3066%2Fchatgpt2.png?generation=1700233710310045&alt=media" alt="">

Based on their wikipedia page

ChatGPT (Chat Generative Pre-trained Transformer) is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

These reviews were extracted from Google Store App

Usage

This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following

Extract sentiments and trends

Identify which version of the app had the most positive feedback, the worst.

Use topic modeling to identify the pain points of the application.

(AND MANY MORE!)

Note

Images generated using Bing Image Generator
Harry Potter (NER + RE)
kaggle.com
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mkdsps (2025). Harry Potter (NER + RE) [Dataset]. https://www.kaggle.com/datasets/mkdsps/harry-potter-ner-re/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mkdsps
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset is designed for Named Entity Recognition (NER) and Relation Extraction (RE) tasks within the fictional Harry Potter universe. It consists of annotated paragraphs where characters, houses, magical items, spells, and locations are identified and linked through predefined relation types such as friend-of, uses, or member-of-house.

📚 Source: The original text is taken from freely available .txt versions of the Harry Potter books (publicly shared online for educational purposes), cleaned and split into context-rich paragraphs.

✨ Annotations:

Entities: CHARACTER, HOUSE, MAGIC_ITEM, SPELL, LOCATION

Relations: 17+ relation types across entity pairs

Format: Tokenized text with JSON-style annotations including token indices for precise entity/relation mapping

🛠️ Annotation Process:

LLM-assisted annotation (GPT-4) with custom prompt engineering

Manual verification of a gold subset using a lightweight checker app

Custom-built token position checkers to ensure annotation accuracy

🔍 Use Cases:

Training and evaluating RE/NER models

Exploring NLP pipelines in a controlled fictional domain

Visualizing relationship graphs using tools like networkx and matplotlib

📦 Files included:

Annotated dataset in JSONL or CSV format

Tokenized paragraph texts (with bert-base)

Golden set with verified labels

🚀 Ideal for anyone learning or experimenting with RE models in NLP!

🧩 Entities and relations

Entities: - CHARACTER – likovi (Harry, Hermione, Dumbledore...) - HOUSE – Hogwarts kuće (Gryffindor, Slytherin...) - MAGIC_ITEM – magični predmeti (wand, broomstick...) - SPELL – čarolije (Expelliarmus, Expecto Patronum...) - LOCATION – lokacije (Hogwarts, Forbidden Forest...)

relations: CHARACTER – CHARACTER: - friend-of, enemy-of, mentor-of, student-of, parent-of, sibling-of, rival-of, ally-of

CHARACTER – HOUSE: - member-of-house, founder-of-house

CHARACTER – MAGIC_ITEM: - uses, owns, acquires, gives

CHARACTER – SPELL: - casts, knows, teaches
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

@Ravi (2023). prompts [Dataset]. http://doi.org/10.34740/kaggle/dsv/5987205

prompts

Unleash Your Imagination: Dive into the World of Prompts!

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/5987205

Dataset updated

Jun 21, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

@Ravi

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons

prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.

In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .

Clear search

Close search

Google apps

Main menu

prompts

One Million Random Midjourney Prompts

Sentiment Analysis Dataset

Standalone ShareGPT Prompts English 1k

ShareGPT English Standalone Prompts Dataset

Description

Key Features

Applications

Methodology

Chain of Thought AI Chatbot

📚 Top 319 Ultimate AI Cheat Sheets

Stable Diffusion generated images - AIS-4SD dataset

AIS-4SD

Context

Dataset summary

Dataset description

Reproducibility

Dataset structure

Project status

🤖 ChatGPT App Google Store Reviews

Context

Usage

Note

Harry Potter (NER + RE)

🧩 Entities and relations

prompts

Unleash Your Imagination: Dive into the World of Prompts!