https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons
prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.
In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .
One Million Random Midjourney Prompts
One Million random prompts that were posted to the public Midjourney channels.
The CSV file contains a second column that includes important keywords found in the prompt text. These keywords can be used as prompts, as Midjourney does not rely on grammar. This allows for flexibility in generating diverse and creative outputs based on the provided keywords. By leveraging the keywords in the second column, you can explore various prompt combinations and unleash the full potential of the prompt engineering process with Midjourney.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data has been created using prompt engineering over chatGPT which has following labels - 0 - negative 1 - neutral 2 - positive
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This curated dataset features 1,000 high-quality, standalone prompts extracted from the ShareGPT corpus. Unlike raw conversation data, these prompts are carefully filtered to ensure they are context-independent, making them ideal for prompt engineering research, LLM training, and chatbot development.
This dataset was created using a multi-stage filtering pipeline: 1. Extracting initial messages from conversations 2. Applying pattern-based filters to remove context-dependent phrases 3. Using NLP techniques to detect and exclude prompts with ambiguous references 4. Validating context independence with LLM verification 5. Manual quality review of edge cases
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
π§ Chain-of-Thought Conversation Dataset
This dataset is designed for training and fine-tuning small and large language models to respond more naturally and intelligently in chatbot applications using Chain-of-Thought (CoT) style reasoning.
π¦ Dataset Highlights:
5+ Dialogue samples
Each entry contains:
User prompt (e.g., "hi", "how are you?")
Assistantβs internal reasoning
Final assistant response (friendly, emoji-rich, human-like)
π§ Why Chain of Thought?
This dataset uses a βthought + responseβ format :
Thinks aloud about user intent, tone, and context
Crafts a reply thatβs human-friendly, tone-aware, and helpful
This helps boost instruction-following, emotional tone matching, and makes small models feel smarter β especially for use in:
AI customer support bots
Personal assistants
Roleplay or gaming characters
Emotional tone recognition agents
π§ Use Cases:
Fine-tuning small LLMs with LoRA, QLoRA, or full SFT
Chatbot intent analysis and tone modeling
Educational use in building interpretable LLMs
Prompt engineering or few-shot examples
π Example Format:
{ "messages": [ { "role": "user", "content": "hi" }, { "role": "assistant", "content": "[Thought: The user greeted casually... ]
Hi there! ππ How can I assist you today?" } ] }
β Who Should Use This?
ML engineers working on chatbot LLMs
Researchers studying reasoning in dialogue agents
Anyone wanting to improve model interpretability
Welcome to our AI Cheat Sheets
Dataset page! Here, you can explore a comprehensive collection of resources, featuring everything from ChatGPT to Midjourney, Gemini, and other AI tools. Access cheat sheets, AI courses, prompt engineering tutorials, and more to upgrade your understanding and skills in AI.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
AIS-4SD (AI Summit - 4 Stable Diffusion models) is a collection of 4.000 images, generated using a set of Stability AI text-to-image diffusion models
This dataset was developed during the development of a collaborative project between PEReN and VIGINUM for the AI Summit held in Paris in February 2025. This open-source project aims at assessing generated images detectors performances and their robustness to different models and transformations. The code is free and open source, and contributions to connect additional detectors are also welcome.
Official repository: https://code.peren.gouv.fr/open-source/ai-action-summit/generated-image-detection.
This dataset can be used to assess detection models performances, and in particular their robustness to successive updates of the generation model.
1.000 generated images with four different versions of stability AI text-to-image diffusion model.
For each models, we generated:
Model | Number of images |
---|---|
stabilityai/stable-diffusion-xl-base-1.0 | 500 π¨ + 500 πΌοΈ |
stabilityai/stable-diffusion-2-1 | 500 π¨ + 500 πΌοΈ |
stabilityai/stable-diffusion-3-medium-diffusers | 500 π¨ + 500 πΌοΈ |
stabilityai/stable-diffusion-3.5-large | 500 π¨ + 500 πΌοΈ |
The scripts used to generated these images can be found on our open-source repository (see this specific file). After setting-up our project, you can run:
$ poetry run python scripts/generate_images.py
With minor updates to these scripts you can enrich this dataset with your specific needs.
One zip file with the following structure, each directory containing the associated 500 images:
AIS-4SD/
βββ generation_metadata.csv
βββ StableDiffusion-2.1-faces-20250203-1448
βββ StableDiffusion-2.1-other-20250203-1548
βββ StableDiffusion-3.5-faces-20250203-1012
βββ StableDiffusion-3.5-other-20250203-1603
βββ StableDiffusion-3-faces-20250203-1545
βββ StableDiffusion-3-other-20250203-1433
βββ StableDiffusion-XL-faces-20250203-0924
βββ StableDiffusion-XL-other-20250203-1727
The metadata for generated images (see generation_metadata.csv
) are:
Project is under ongoing development. A preliminary blog post can be found here: https://www.peren.gouv.fr/en/perenlab/2025-02-11_ai_summit/.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd7e02bf38f4b08df2508d6b6e42f3066%2Fchatgpt2.png?generation=1700233710310045&alt=media" alt="">
Based on their wikipedia page
ChatGPT (Chat Generative Pre-trained Transformer) is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.
These reviews were extracted from Google Store App
This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following
(AND MANY MORE!)
Images generated using Bing Image Generator
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is designed for Named Entity Recognition (NER) and Relation Extraction (RE) tasks within the fictional Harry Potter universe. It consists of annotated paragraphs where characters, houses, magical items, spells, and locations are identified and linked through predefined relation types such as friend-of, uses, or member-of-house.
π Source: The original text is taken from freely available .txt versions of the Harry Potter books (publicly shared online for educational purposes), cleaned and split into context-rich paragraphs.
β¨ Annotations:
Entities: CHARACTER, HOUSE, MAGIC_ITEM, SPELL, LOCATION
Relations: 17+ relation types across entity pairs
Format: Tokenized text with JSON-style annotations including token indices for precise entity/relation mapping
π οΈ Annotation Process:
LLM-assisted annotation (GPT-4) with custom prompt engineering
Manual verification of a gold subset using a lightweight checker app
Custom-built token position checkers to ensure annotation accuracy
π Use Cases:
Training and evaluating RE/NER models
Exploring NLP pipelines in a controlled fictional domain
Visualizing relationship graphs using tools like networkx and matplotlib
π¦ Files included:
Annotated dataset in JSONL or CSV format
Tokenized paragraph texts (with bert-base)
Golden set with verified labels
π Ideal for anyone learning or experimenting with RE models in NLP!
Entities: - CHARACTER β likovi (Harry, Hermione, Dumbledore...) - HOUSE β Hogwarts kuΔe (Gryffindor, Slytherin...) - MAGIC_ITEM β magiΔni predmeti (wand, broomstick...) - SPELL β Δarolije (Expelliarmus, Expecto Patronum...) - LOCATION β lokacije (Hogwarts, Forbidden Forest...)
relations: CHARACTER β CHARACTER: - friend-of, enemy-of, mentor-of, student-of, parent-of, sibling-of, rival-of, ally-of
CHARACTER β HOUSE: - member-of-house, founder-of-house
CHARACTER β MAGIC_ITEM: - uses, owns, acquires, gives
CHARACTER β SPELL: - casts, knows, teaches
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons
prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.
In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .