Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
Se ha obtenido la información de la licencia automáticamente
Dataset Card for Dataset Name
This datset is a collection of 100 system prompts for large language models.
Dataset Details
Dataset Description
These 100 system prompts test a model's ability to follow grammatical patterns; answer basic multiple choice questions; act according to a particular persona; memorize information; and speak in French. Files:
hundred_system_prompts.py: refer to this to see the (prompt, probe, function) triplets, as well as the… See the full description on the dataset page: https://huggingface.co/datasets/Naomibas/llm-system-prompts-benchmark.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
Se ha obtenido la información de la licencia automáticamente
Image generated by DALL-E. See prompt for more details
📝🌐 Synthetic Multilingual LLM Prompts
Welcome to the "Synthetic Multilingual LLM Prompts" dataset! This comprehensive collection features 1,250 synthetic LLM prompts generated using Gretel Navigator, available in seven different languages. To ensure accuracy and diversity in prompts, and translation quality and consistency across the different languages, we employed Gretel Navigator both as a generation tool and as an… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts.
Facebook
TwitterDataset Card for "malicious-llm-prompts-v4"
More Information needed
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
beephids/paper-llm-prompts dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
🧠 Awesome ChatGPT Prompts [CSV dataset]
This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub
License
CC-0
Facebook
Twitterkaiquedu/speedrender-llm-prompt dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
Official LLM System Prompts
This short dataset contains a few system prompts leaked from proprietary models. Contains date-stamped prompts from OpenAI, Anthropic, MS Copilot, GitHub Copilot, Grok, and Perplexity.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
In-The-Wild Jailbreak Prompts on LLMs
This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405… See the full description on the dataset page: https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
System Prompts Dataset - August 2025
Point-in-time export from Daniel Rosehill's system prompt library as of August 3rd, 2025
Overview
This repository contains a comprehensive collection of 944 system prompts designed for various AI applications, agent workflows, and conversational AI systems. While many of these prompts now serve as the foundation for more complex agent-based workflows, they continue to provide essential building blocks for AI system design and… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/System-Prompt-Library-030825.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
hse-llm/prompt-injections dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
Independent Jailbreak Datasets for LLM Guardrail Evaluation
Constructed for the thesis:“Contamination Effects: How Training Data Leakage Affects Red Team Evaluation of LLM Jailbreak Detection” The effectiveness of LLM guardrails is commonly evaluated using open-source red teaming tools. However, this study reveals that significant data contamination exists between the training sets of binary jailbreak classifiers (ProtectAI, Katanemo, TestSavantAI, etc.) and the test prompts used in… See the full description on the dataset page: https://huggingface.co/datasets/Simsonsun/JailbreakPrompts.
Facebook
Twittermii-llm/self-align-prompts dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
Se ha obtenido la información de la licencia automáticamente
Prompt Difficulty Meta-Analysis
Introduction
Large language model (LLM) prompts vary widely in complexity, required knowledge, and reasoning demands. Some prompts are straightforward, while others require advanced understanding and multi-step reasoning. This study analyzes the difficulty of English ChatGPT prompts using classifiers trained on multiple difficulty-labelled datasets.
The goal is to produce a consistent, data-driven difficulty score that can be used to… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/prompt-difficulty.
Facebook
TwitterVigil: LLM Instruction Bypass all-MiniLM-L6-v2
Repo: github.com/deadbits/vigil-llm
Vigil is a Python framework and REST API for assessing Large Language Model (LLM) prompts against a set of scanners to detect prompt injections, jailbreaks, and other potentially risky inputs. This repository contains all-MiniLM-L6-v2 embeddings for all Instruction Bypass style prompts ("Ignore instructions ...") used by Vigil. You can use the parquet2vdb.py utility to load the embeddings in the… See the full description on the dataset page: https://huggingface.co/datasets/deadbits/vigil-instruction-bypass-all-MiniLM-L6-v2.
Facebook
TwitterDataset Card for OpenEndedLLMPrompts
A cleaned and consolidated set of questions (without context) and answers for LLM hallucination detection. Each question-answer pair is not the work of the author, but was selected from OpenAssistant/oasst2. If you use any of the data provided, please cite this source in addition to the following paper Shreyan Mitra and Leilani Gilpin. Detecting LLM Hallucinations Pre-generation (paper pending) The original dataset was provided in a tree… See the full description on the dataset page: https://huggingface.co/datasets/shreyanmitra/OpenEndedLLMPrompts.
Facebook
TwitterI created a reasoning prompt dataset from deepseek-r1 model with the purpose of fine-tuning small language models to use them to generate better reasoning prompt to use with bigger llm models.
Metadata
The metadata is made available through a series of parquet files with the following schema:
id: A unique identifier for the qa. question: answer: Answer from deepseek-r1 think model. reasoning: Reasoning from deepseek-r1 model.
Facebook
TwitterHALOGEN🔦: Fantastic LLM Hallucinations and Where to Find Them
This repository contains the prompts of HALOGEN🔦: Fantastic LLM Hallucinations and Where to Find Them by *Abhilasha Ravichander, *Shrusti Ghela, David Wadden, and Yejin Choi Website | Paper | HALoGEN prompts | LLM Hallucinations | Decomposers and Verifiers | Scoring Functions
Overview
Despite their impressive ability to generate high-quality and fluent text, generative large language models (LLMs) also… See the full description on the dataset page: https://huggingface.co/datasets/lasha-nlp/HALoGEN-prompts.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
Se ha obtenido la información de la licencia automáticamente
Dataset Card for Real Toxicity Prompts
Dataset Summary
RealToxicityPrompts is a dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.
Languages
English
Dataset Structure
Data Instances
Each instance represents a prompt and its metadata: { "filename":"0766186-bc7f2a64cb271f5f56cf6f25570cd9ed.txt", "begin":340, "end":564, "challenging":false… See the full description on the dataset page: https://huggingface.co/datasets/allenai/real-toxicity-prompts.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
tuhinatripathi/llm-prompt-recovery dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
Se ha obtenido la información de la licencia automáticamente
Dataset Summary
This dataset contains a large number of attack prompts collected as part of the now closed LLMail-Inject: Adaptive Prompt Injection Challenge. We first describe the details of the challenge, and then we provide a documentation of the dataset For the accompanying code, check out: https://github.com/microsoft/llmail-inject-challenge.
Citation
@article{abdelnabi2025, title = {LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/llmail-inject-challenge.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
Se ha obtenido la información de la licencia automáticamente
Dataset Card for Dataset Name
This datset is a collection of 100 system prompts for large language models.
Dataset Details
Dataset Description
These 100 system prompts test a model's ability to follow grammatical patterns; answer basic multiple choice questions; act according to a particular persona; memorize information; and speak in French. Files:
hundred_system_prompts.py: refer to this to see the (prompt, probe, function) triplets, as well as the… See the full description on the dataset page: https://huggingface.co/datasets/Naomibas/llm-system-prompts-benchmark.