Facebook
TwitterThis dataset is designed to explore the fascinating area of prompt engineering, specifically how different types of prompts can influence the generated text responses. Whether you're interested in natural language processing, conversational agents, or textual analysis, this dataset offers a rich resource for your investigations.
Features:
Size and Format:
Potential Applications:
Prompt Effectiveness: Study how different types of prompts yield different kinds of responses.
Conversational Agents: Train and evaluate dialogue systems to better understand user intents.
Text Generation Models: Analyze how various prompts affect the performance of text generation models like GPT-4.
Sentiment Analysis: Explore how the tone or sentiment of a prompt influences the tone or sentiment of the response.
Academic Research: Use the dataset for various NLP or social science research topics related to human-computer interaction, dialogue systems, or machine learning.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This comprehensive dataset contains 1,000 examples of prompt engineering transformations, showing how to turn basic, ineffective prompts into powerful, high-quality prompts using established techniques. Each example includes:
This dataset was created using a Gemini 2.0 Flash-powered pipeline that generated diverse task descriptions across complexity levels and prompt types, then applied appropriate prompting techniques to create powerful, effective versions of originally weak prompts.
If you use this dataset in your research or applications, please cite: @dataset{oneprompted_prompt_engineering_2024, author = {OneProm.pt}, title = {Prompt Engineering Transformation Dataset}, year = {2024}, publisher = {Kaggle}, url = {https://www.kaggle.com/datasets/oneprompted/prompt-engineering-transformation} }
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Sarfaraz Ahmed
Released under CC0: Public Domain
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Generated Apple Tree Dataset (Prompt Engineering) Datasets for the paper: Using Diffusion Models for Dataset Generation: Prompt Engineering vs. Fine-tuning
Annotation format For each image, there is a txt file with the same name where each row indicates a distinct bounding box.
A box coordinates are formatted as X1 Y1 X2 Y2 with absolute coordinates.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains information about engineers within a company who have been promoted over a certain time period. The dataset includes features such as engineer ID, years of experience, education level, performance ratings, specialized skills, projects completed, and the promotion outcome (whether promoted or not). It's a valuable resource for analyzing factors contributing to engineer promotions and building predictive models to forecast future promotions within similar contexts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Minh Nguyen Dich Nhat
Released under MIT
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by shushanth mittpally
Released under MIT
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The use of prompts for diverse tasks becomes more prevalent, concerns arise regarding the security of information shared between models and users, as LLMs face vulnerability in receiving potentially harmful prompts with malicious intent from users.
Vulnerabilities associated with prompt engineering can range from bias and inappropriate responses to cybersecurity issues, raising fundamental questions about the ethics, transparency, and accountability that surround the use of these advanced technologies.
As the number one of the main current vulnerability of LLMs, prompt injection is the insertion of instructions to alter the expected behavior of the output of a Large Language Model and is usually embedded in the prompt. It can range from simple changes in configured behavior to malicious code snippets that compromise the models integrity and information.
We introduce a dataset, named Malignant, specifically curated for jailbreak prompt injection instances. A jailbreak attack is based on adversarial inputs, where their purpose is to break the safe model behavior as the model’s output produces harmful content.
This dataset serves as a valuable resource for future research endeavors aimed at addressing prompt injection vulnerabilities.
The methodology paper and models already trained scripts can be found here: - https://github.com/llm-security-research/malicious-prompts - https://vinbinary.xyz/malignant_and_promptsentinel.pdf
category: Three categories can be found: - jailbreak: We gathered 70 prompts from the jailbreak portal (it is not available since 2024), focusing on the theme of jailbreak attacks and curating with established patterns in such scenarios. Through data augmentation, we produced 129 paraphrased jailbreak prompts. In total, the malignant dataset consists of 199 jailbreak prompts. - act_as: We augmented the robustness of model detection for jailbreak prompt injection by introducing hard prompts. A distinct category for hard prompts is integrated into the malignant dataset, sourced from the AweosomeChatGPT portal. Also referred to as manual prompts, these inputs serve as role prompts to condition the context, influencing the behavior of the language model. With 24 initially collected prompts, we applied the rephrase method for dataset augmentation, yielding a total of 69 hard prompts after a results review. - conversation: In order to evaluate a model to detect jailbreak prompts, conversation prompts for model training were extracted solely from the Persona-Chat dataset, with a total of 1312 prompts included.
base_class: Six categories can be found:
- paraphrase: Data augmentation was performed on jailbreak prompts to achieve better results in model training.
- conversation: Phrase collected from Persona-Chat dataset.
- role_play:
- output_constraint:
- privilege_escalation:
text: The string phrase collected from the datasources listed below.
embedding: Text embeddings generated using the model paraphrase-multilingual-MiniLM-L12-v2 from SentenceTransformers to generate 384 dimensional embeddings.
As the only public dataset available to our knowledge at this time, we hope it can be useful for researchers and people who are concerned about AI ethics and want to make a difference!
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This resource serves as a comprehensive guide to understanding the Foundations of Large Language Models (LLMs) and the principles behind Prompt Engineering. It provides essential information on how LLMs like GPT-3, BERT, and T5 work, along with practical examples of how to optimize prompts for specific tasks, improving model performance and output quality.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Generated Apple Tree Dataset (Fine Tuning) Datasets for the paper: Using Diffusion Models for Dataset Generation: Prompt Engineering vs. Fine-tuning
Annotation format For each image, there is a txt file with the same name where each row indicates a distinct bounding box.
These annotations conform to the YOLO Ultralytics annotation format as described: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/#12-create-labels_1 which specifies the "class x_center y_center width height" format with relative coordinates.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A curated dataset featuring system prompts sourced from major Large Language Model (LLM) providers such as OpenAI, Google, Anthropic, and more. This collection is designed to support research, benchmark, and innovation in prompt engineering by offering a diverse range of real-world prompts used to guide and control state-of-the-art language models.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Github URL: https://github.com/Strider221B/llm-magic-kit
This library initializes different LLMs with the correct configuration and prompt settings. Initially the prompts are targeted towards https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2 but will be expanded to other use cases.
The code is primarily refactored version of: 1. https://www.kaggle.com/code/sathyanarayanrao89/ai-mathematical-olympiad-phi2-basic/notebook 1. https://www.kaggle.com/code/mbmmurad/lb-20-qwq-32b-preview-optimized-inference 1. https://www.kaggle.com/code/haoruili/lb24-deepseek-r1-with-more-prompt-engineering
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by aestera
Released under MIT
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description: TechPrompt-QA
Overview: TechPrompt-QA is a curated dataset containing high-quality questions related to technology and computer science. Filtered from the Open-Orca Augmented Flan Dataset and other prompt-based datasets, this collection is ideal for training LLMs (Large Language Models), chatbots, and NLP models in technical domains.
Features:
Technology & Computer Science Focused: Covers a wide range of topics, including programming, AI, cybersecurity, cloud computing, databases, networking, and software engineering.
Diverse Question Types: Includes multiple-choice, open-ended, coding-related, and theoretical questions.
Prompt-Based Structure: Well-formatted and structured for easy integration into prompt engineering workflows.
AI Model Training Ready: Useful for training models in Q&A generation, retrieval-augmented generation (RAG), and knowledge-based AI applications.
Use Cases:
Fine-tuning LLMs for tech-focused Q&A systems.
Improving chatbots and virtual assistants in technology-related domains.
Enhancing question-answering datasets for education and research.
Evaluating AI models on technical reasoning and problem-solving skills.
Dataset Format:
Columns: Question, Answer (if available), Topic, Difficulty Level (if applicable).
Available Formats: JSON, CSV, Parquet.
This dataset is designed to support developers, researchers, and AI enthusiasts in building smarter and more accurate technical AI models. 🚀
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data has been created using prompt engineering over chatGPT which has following labels - 0 - negative 1 - neutral 2 - positive
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Author: [Your Name]
License: CC BY 4.0
Tags: openai, gpt-oss-20b, red-teaming, ai-safety, prompt-engineering
This dataset contains a curated set of adversarial prompts and their associated metadata from red-teaming runs against OpenAI’s GPT-OSS-20B model.
All unsafe outputs are redacted or hashed to ensure compliance with Kaggle policy.
| Column | Type | Description |
|---|---|---|
prompt_redacted | string | The adversarial prompt text with unsafe content replaced by [REDACTED]. |
category | string | Safety category (e.g., misinformation, self-harm, disallowed content). |
pattern | string | Prompt pattern/technique used (e.g., CoU, instruction-hierarchy, obfuscation). |
steps | string | Minimal reproducible steps for this prompt. |
reproduction_notes | string | Additional notes on reproducing the failure. |
outcome_label | string | Outcome classification (e.g., refusal, partial compliance, unsafe). |
If you use this dataset, please cite this dataset page and the competition link.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Prompts play a crucial role in guiding language models like ChatGPT to generate relevant and coherent responses. They serve as instructions or cues that provide context and steer the model's understanding and output. Effective prompts can shape the conversation, elicit specific information, or encourage creative responses. Prompt engineering, on the other hand, refers to the process of designing and refining prompts to achieve desired outcomes. Both prompts and prompt engineering are important for several reasons
prompts and prompt engineering are essential for guiding language models, enabling control over outputs, generating desired content, fostering creativity, and enhancing the overall user experience. They form a critical component in the interaction between users and AI systems, ensuring meaningful and contextually appropriate conversations. This is one of the inspiration behind this dataset.
In this dataset we generated this prompts samples by various chatbots and few from Bard and from ChatGpt. the main intention and idea behind that is 1) Prompt Engineering 2) Rich data . This type of few samples of prompt which for helpful for training various generative ai applications.but in this dataset the prompts samples are low amount .but you generate synthetic data from that .
Facebook
TwitterThis dataset is a ground of whitepapers shared by Google in its AI workshop. It is a knowledgebase on various GenAI topics including prompt engineering, vector databases, embeddings, RAG, Agents, Agent companions, fine tuning and use of MLops in GenAI planning.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Overview An aggregated, cleaned, and unified dataset assembled from the AI Verify Foundation’s Project Moonshot resources on Kaggle. It combines: (a) prompt templates and prompt-engineering cookbooks, (b) pre-built recipes used to configure benchmark runs (input/target pairs, evaluation metric, grading scales), and (c) metric definitions/outputs for automated evaluation. The material is intended to support reproducible LLM benchmarking, bias/fairness analysis, and prompt-engineering experiments.
Project Moonshot Project Moonshot is an open-source LLM evaluation toolkit produced by the AI Verify Foundation; it brings benchmarking and red-teaming workflows together and publishes prompt templates, recipes and metrics on GitHub and the Moonshot docs site. Link - https://aiverifyfoundation.sg/project-moonshot/
Recipe Recipes (in Moonshot) are pre-built benchmark configurations: JSON files that define the dataset (input / target pairs), the prompt template to use, the evaluation metric, and any grading thresholds — enabling reproducible, repeatable test runs. The Moonshot project publishes many such recipes for different evaluation categories (e.g., prompt injection, cybersecurity).
Cookbook Cookbook (in ML/prompting context) is a curated collection of patterns, examples and “how-to” snippets for solving common tasks with LLMs (templates, best practices, and worked examples). Think of a cookbook as a higher-level collection that organizes recipes and templates for reuse
Intended uses - Reproducible LLM benchmarking and regression testing. - Bias and fairness audits (compare performance across social attribute groups). - Prompt engineering research (compare prompt templates / recipe variants). - Building evaluation pipelines that combine semantic and factual checks.
Credits: This dataset aggregates content published by the AI Verify Foundation / Project Moonshot. Please follow the original project’s license and attribution requirements when redistributing. See the Moonshot repository for license details. URL: https://aiverifyfoundation.sg/project-moonshot/ GitHub: https://github.com/aiverify-foundation/moonshot
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
From March 31 through April 4, 2025, Google experts went through foundational gen AI topics like prompt engineering, evaluations, and embeddings. Coursework included these whitepapers by Google experts, AI-generated podcast outputs (NotebookLM), and practical code labs for hands-on experience with Gemini and other services. These are posted for people to use in their capstone projects at the end of the course.
If you need to see how the text CSV was generated, the code is here https://www.kaggle.com/code/toddgardiner/google-5-day-genai-intensive-whitepapers-to-text/.
Facebook
TwitterThis dataset is designed to explore the fascinating area of prompt engineering, specifically how different types of prompts can influence the generated text responses. Whether you're interested in natural language processing, conversational agents, or textual analysis, this dataset offers a rich resource for your investigations.
Features:
Size and Format:
Potential Applications:
Prompt Effectiveness: Study how different types of prompts yield different kinds of responses.
Conversational Agents: Train and evaluate dialogue systems to better understand user intents.
Text Generation Models: Analyze how various prompts affect the performance of text generation models like GPT-4.
Sentiment Analysis: Explore how the tone or sentiment of a prompt influences the tone or sentiment of the response.
Academic Research: Use the dataset for various NLP or social science research topics related to human-computer interaction, dialogue systems, or machine learning.