9 datasets found
  1. prompts

    • kaggle.com
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    @Ravi (2023). prompts [Dataset]. http://doi.org/10.34740/kaggle/dsv/5987205
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    @Ravi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons

    prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.

    In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .

  2. One Million Random Midjourney Prompts

    • kaggle.com
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NikBearBrown (2023). One Million Random Midjourney Prompts [Dataset]. https://www.kaggle.com/datasets/nikbearbrown/one-million-random-midjourney-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    Kaggle
    Authors
    NikBearBrown
    Description

    One Million Random Midjourney Prompts

    • One Million random prompts that were posted to the public Midjourney channels.

    • The CSV file contains a second column that includes important keywords found in the prompt text. These keywords can be used as prompts, as Midjourney does not rely on grammar. This allows for flexibility in generating diverse and creative outputs based on the provided keywords. By leveraging the keywords in the second column, you can explore various prompt combinations and unleash the full potential of the prompt engineering process with Midjourney.

  3. Sentiment Analysis Dataset

    • kaggle.com
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samarth Kuchya (2024). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/samarthkumarkuchya/sentiment-analysis-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samarth Kuchya
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This data has been created using prompt engineering over chatGPT which has following labels - 0 - negative 1 - neutral 2 - positive

  4. Standalone ShareGPT Prompts English 1k

    • kaggle.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin Fairbanks (2025). Standalone ShareGPT Prompts English 1k [Dataset]. https://www.kaggle.com/datasets/austinfairbanks/sharegpt-prompts-1k/versions/4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Austin Fairbanks
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ShareGPT English Standalone Prompts Dataset

    Description

    This curated dataset features 1,000 high-quality, standalone prompts extracted from the ShareGPT corpus. Unlike raw conversation data, these prompts are carefully filtered to ensure they are context-independent, making them ideal for prompt engineering research, LLM training, and chatbot development.

    Key Features

    • Context-Independent: Each prompt stands alone without requiring previous conversation history
    • Diverse Topics: Spans domains including programming, science, creative writing, business, and everyday queries
    • Clean & Preprocessed: Removed conversational artifacts, references to prior messages, and ambiguous pronouns
    • Quality-Filtered: Includes only substantive prompts with clear intent and sufficient detail
    • Metadata Enriched: Includes prompt length, complexity estimate, and topic classification

    Applications

    • Training stronger zero-shot LLM capabilities
    • Developing prompt classification systems
    • Researching prompt optimization techniques
    • Benchmarking LLM performance across diverse query types
    • Building more robust chatbot interactions

    Methodology

    This dataset was created using a multi-stage filtering pipeline: 1. Extracting initial messages from conversations 2. Applying pattern-based filters to remove context-dependent phrases 3. Using NLP techniques to detect and exclude prompts with ambiguous references 4. Validating context independence with LLM verification 5. Manual quality review of edge cases

  5. Chain of Thought AI Chatbot

    • kaggle.com
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blue strike AI (2025). Chain of Thought AI Chatbot [Dataset]. https://www.kaggle.com/datasets/bluestrikeai/cot-conversation-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Blue strike AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🧠 Chain-of-Thought Conversation Dataset

    This dataset is designed for training and fine-tuning small and large language models to respond more naturally and intelligently in chatbot applications using Chain-of-Thought (CoT) style reasoning.

    πŸ“¦ Dataset Highlights:

    5+ Dialogue samples

    Each entry contains:

    User prompt (e.g., "hi", "how are you?")

    Assistant’s internal reasoning

    Final assistant response (friendly, emoji-rich, human-like)

    🧠 Why Chain of Thought?

    This dataset uses a β€œthought + response” format :

    1. Thinks aloud about user intent, tone, and context

    2. Crafts a reply that’s human-friendly, tone-aware, and helpful

    This helps boost instruction-following, emotional tone matching, and makes small models feel smarter β€” especially for use in:

    AI customer support bots

    Personal assistants

    Roleplay or gaming characters

    Emotional tone recognition agents

    πŸ”§ Use Cases:

    Fine-tuning small LLMs with LoRA, QLoRA, or full SFT

    Chatbot intent analysis and tone modeling

    Educational use in building interpretable LLMs

    Prompt engineering or few-shot examples

    πŸ“ Example Format:

    { "messages": [ { "role": "user", "content": "hi" }, { "role": "assistant", "content": "[Thought: The user greeted casually... ]

    Hi there! πŸ‘‹πŸ˜Š How can I assist you today?" } ] }

    βœ… Who Should Use This?

    ML engineers working on chatbot LLMs

    Researchers studying reasoning in dialogue agents

    Anyone wanting to improve model interpretability

  6. πŸ“š Top 319 Ultimate AI Cheat Sheets

    • kaggle.com
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Fire (2025). πŸ“š Top 319 Ultimate AI Cheat Sheets [Dataset]. https://www.kaggle.com/datasets/aifire/top-319-ultimate-ai-cheat-sheets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    AI Fire
    Description

    Welcome to our AI Cheat Sheets

    Dataset page! Here, you can explore a comprehensive collection of resources, featuring everything from ChatGPT to Midjourney, Gemini, and other AI tools. Access cheat sheets, AI courses, prompt engineering tutorials, and more to upgrade your understanding and skills in AI.

  7. Stable Diffusion generated images - AIS-4SD dataset

    • zenodo.org
    zip
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Stable Diffusion generated images - AIS-4SD dataset [Dataset]. http://doi.org/10.5281/zenodo.15131117
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Time period covered
    Feb 3, 2025
    Description

    AIS-4SD

    AIS-4SD (AI Summit - 4 Stable Diffusion models) is a collection of 4.000 images, generated using a set of Stability AI text-to-image diffusion models

    Context

    This dataset was developed during the development of a collaborative project between PEReN and VIGINUM for the AI Summit held in Paris in February 2025. This open-source project aims at assessing generated images detectors performances and their robustness to different models and transformations. The code is free and open source, and contributions to connect additional detectors are also welcome.

    Official repository: https://code.peren.gouv.fr/open-source/ai-action-summit/generated-image-detection.

    Dataset summary

    This dataset can be used to assess detection models performances, and in particular their robustness to successive updates of the generation model.

    Dataset description

    1.000 generated images with four different versions of stability AI text-to-image diffusion model.

    For each models, we generated:

    ModelNumber of images
    stabilityai/stable-diffusion-xl-base-1.0500 πŸ‘¨ + 500 πŸ–ΌοΈ
    stabilityai/stable-diffusion-2-1500 πŸ‘¨ + 500 πŸ–ΌοΈ
    stabilityai/stable-diffusion-3-medium-diffusers500 πŸ‘¨ + 500 πŸ–ΌοΈ
    stabilityai/stable-diffusion-3.5-large500 πŸ‘¨ + 500 πŸ–ΌοΈ
     

    Reproducibility

    The scripts used to generated these images can be found on our open-source repository (see this specific file). After setting-up our project, you can run:

    $ poetry run python scripts/generate_images.py

    With minor updates to these scripts you can enrich this dataset with your specific needs.

    Dataset structure

    One zip file with the following structure, each directory containing the associated 500 images:

    AIS-4SD/
    β”œβ”€β”€ generation_metadata.csv
    β”œβ”€β”€ StableDiffusion-2.1-faces-20250203-1448
    β”œβ”€β”€ StableDiffusion-2.1-other-20250203-1548
    β”œβ”€β”€ StableDiffusion-3.5-faces-20250203-1012
    β”œβ”€β”€ StableDiffusion-3.5-other-20250203-1603
    β”œβ”€β”€ StableDiffusion-3-faces-20250203-1545
    β”œβ”€β”€ StableDiffusion-3-other-20250203-1433
    β”œβ”€β”€ StableDiffusion-XL-faces-20250203-0924
    └── StableDiffusion-XL-other-20250203-1727

    The metadata for generated images (see generation_metadata.csv) are:

    • model: model used for generation,
    • prompt: prompt used for generation (ie Conceptual Captions caption or sfhqt2i prompt, with some minor prompt engineering),
    • guidance_scale: guidance scale of diffusion process,
    • num_inference_steps: number of inference steps of diffusion process,
    • generated_img_relative_path: relative path to image in zip structure.

    Project status

    Project is under ongoing development. A preliminary blog post can be found here: https://www.peren.gouv.fr/en/perenlab/2025-02-11_ai_summit/.

  8. πŸ€– ChatGPT App Google Store Reviews

    • kaggle.com
    Updated Nov 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2023). πŸ€– ChatGPT App Google Store Reviews [Dataset]. http://doi.org/10.34740/kaggle/ds/4017553
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd7e02bf38f4b08df2508d6b6e42f3066%2Fchatgpt2.png?generation=1700233710310045&alt=media" alt="">

    Based on their wikipedia page

    ChatGPT (Chat Generative Pre-trained Transformer) is a large language model-based chatbot developed by OpenAI and launched on November 30, 2022, that enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.

    These reviews were extracted from Google Store App

    Usage

    This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following

    1. Extract sentiments and trends
    2. Identify which version of the app had the most positive feedback, the worst.
    3. Use topic modeling to identify the pain points of the application.

    (AND MANY MORE!)

    Note

    Images generated using Bing Image Generator

  9. Harry Potter (NER + RE)

    • kaggle.com
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mkdsps (2025). Harry Potter (NER + RE) [Dataset]. https://www.kaggle.com/datasets/mkdsps/harry-potter-ner-re/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    mkdsps
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is designed for Named Entity Recognition (NER) and Relation Extraction (RE) tasks within the fictional Harry Potter universe. It consists of annotated paragraphs where characters, houses, magical items, spells, and locations are identified and linked through predefined relation types such as friend-of, uses, or member-of-house.

    πŸ“š Source: The original text is taken from freely available .txt versions of the Harry Potter books (publicly shared online for educational purposes), cleaned and split into context-rich paragraphs.

    ✨ Annotations:

    Entities: CHARACTER, HOUSE, MAGIC_ITEM, SPELL, LOCATION

    Relations: 17+ relation types across entity pairs

    Format: Tokenized text with JSON-style annotations including token indices for precise entity/relation mapping

    πŸ› οΈ Annotation Process:

    LLM-assisted annotation (GPT-4) with custom prompt engineering

    Manual verification of a gold subset using a lightweight checker app

    Custom-built token position checkers to ensure annotation accuracy

    πŸ” Use Cases:

    Training and evaluating RE/NER models

    Exploring NLP pipelines in a controlled fictional domain

    Visualizing relationship graphs using tools like networkx and matplotlib

    πŸ“¦ Files included:

    Annotated dataset in JSONL or CSV format

    Tokenized paragraph texts (with bert-base)

    Golden set with verified labels

    πŸš€ Ideal for anyone learning or experimenting with RE models in NLP!

    🧩 Entities and relations

    Entities: - CHARACTER – likovi (Harry, Hermione, Dumbledore...) - HOUSE – Hogwarts kuΔ‡e (Gryffindor, Slytherin...) - MAGIC_ITEM – magični predmeti (wand, broomstick...) - SPELL – čarolije (Expelliarmus, Expecto Patronum...) - LOCATION – lokacije (Hogwarts, Forbidden Forest...)

    relations: CHARACTER – CHARACTER: - friend-of, enemy-of, mentor-of, student-of, parent-of, sibling-of, rival-of, ally-of

    CHARACTER – HOUSE: - member-of-house, founder-of-house

    CHARACTER – MAGIC_ITEM: - uses, owns, acquires, gives

    CHARACTER – SPELL: - casts, knows, teaches

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
@Ravi (2023). prompts [Dataset]. http://doi.org/10.34740/kaggle/dsv/5987205
Organization logo

prompts

Unleash Your Imagination: Dive into the World of Prompts!

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
@Ravi
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons

prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.

In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .

Search
Clear search
Close search
Google apps
Main menu