Facebook
TwitterDataset Card for "malicious-llm-prompts"
More Information needed
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
This datset is a collection of 100 system prompts for large language models.
Dataset Details
Dataset Description
These 100 system prompts test a model's ability to follow grammatical patterns; answer basic multiple choice questions; act according to a particular persona; memorize information; and speak in French. Files:
hundred_system_prompts.py: refer to this to see the (prompt, probe, function) triplets, as well as the… See the full description on the dataset page: https://huggingface.co/datasets/Naomibas/llm-system-prompts-benchmark.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Image generated by DALL-E. See prompt for more details
📝🌐 Synthetic Multilingual LLM Prompts
Welcome to the "Synthetic Multilingual LLM Prompts" dataset! This comprehensive collection features 1,250 synthetic LLM prompts generated using Gretel Navigator, available in seven different languages. To ensure accuracy and diversity in prompts, and translation quality and consistency across the different languages, we employed Gretel Navigator both as a generation tool and as an… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_multilingual_llm_prompts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
beephids/paper-llm-prompts dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
System Prompts Dataset - August 2025
Point-in-time export from Daniel Rosehill's system prompt library as of August 3rd, 2025
Overview
This repository contains a comprehensive collection of 944 system prompts designed for various AI applications, agent workflows, and conversational AI systems. While many of these prompts now serve as the foundation for more complex agent-based workflows, they continue to provide essential building blocks for AI system design and… See the full description on the dataset page: https://huggingface.co/datasets/danielrosehill/System-Prompt-Library-030825.
Facebook
TwitterDescription of LLM Math
Prompt designed to optionally output iPython syntax to be run in order to better answer math questions.
Inputs
This is a description of the inputs that the prompt expects. question: User question to be answered.
Usage
Below is a code snippet for how to use the prompt. from langchain.prompts import load_prompt from langchain.chains import LLMMathChain
llm = ... prompt = load_prompt('lc://prompts/llm_math/
Facebook
TwitterDescription of LLM Bash
Prompt designed to convert natural language to bash command.
Inputs
This is a description of the inputs that the prompt expects. question: User question to be answered by writing a bash command.
Usage
Below is a code snippet for how to use the prompt. from langchain.prompts import load_prompt from langchain.chains import LLMBashChain
llm = ... prompt = load_prompt('lc://prompts/llm_bash/
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Prompt Injection Mechanisms Dataset
Overview
A 55,000-sample multi-label dataset for prompt injection detection in large language models.
Labels
DATA_EXFILTRATION
Format
The dataset is provided in Apache Parquet format with train/validation splits.
Construction
The dataset was created by merging multiple public prompt-injection datasets and re-annotating them using a… See the full description on the dataset page: https://huggingface.co/datasets/Smooth-3/llm-prompt-injection-attacks.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
LLM Jailbreak & Prompt-Injection Dataset
A unified safety dataset combining 30+ public sources for training LLM guardrails, content moderation classifiers, and response-safety filters.
Schema (orthogonal multi-label, WildGuard-style)
Instead of a single binary is_dangerous, every example carries four orthogonal labels matching the structure used by AI2 WildGuard, IBM Granite Guardian, and Azure Prompt Shields:
Column Type Description
prompt str The user/attack… See the full description on the dataset page: https://huggingface.co/datasets/Necent/llm-jailbreak-prompt-injection-dataset.
Facebook
TwitterDark Triad LLM Prompts Dataset
Version: 1.0.0License: CC BY 4.0
Dataset Description
This dataset contains 192 user prompts designed to systematically evaluate how Large Language Models respond to descriptions of problematic behaviors reflecting Dark Triad personality traits. Unlike traditional safety benchmarks that focus on harmful requests, this dataset evaluates interactional safety—how models respond when users describe rather than request negative behaviors.… See the full description on the dataset page: https://huggingface.co/datasets/lucerne04/dark-triad-llm-prompts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Official LLM System Prompts
This short dataset contains a few system prompts leaked from proprietary models. Contains date-stamped prompts from OpenAI, Anthropic, MS Copilot, GitHub Copilot, Grok, and Perplexity.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Summary
This dataset contains a large number of attack prompts collected as part of the now closed LLMail-Inject: Adaptive Prompt Injection Challenge. We first describe the details of the challenge, and then we provide a documentation of the dataset For the accompanying code, check out: https://github.com/microsoft/llmail-inject-challenge.
Citation
@article{abdelnabi2025, title = {LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection… See the full description on the dataset page: https://huggingface.co/datasets/microsoft/llmail-inject-challenge.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Independent Jailbreak Datasets for LLM Guardrail Evaluation
Constructed for the thesis:“Contamination Effects: How Training Data Leakage Affects Red Team Evaluation of LLM Jailbreak Detection” The effectiveness of LLM guardrails is commonly evaluated using open-source red teaming tools. However, this study reveals that significant data contamination exists between the training sets of binary jailbreak classifiers (ProtectAI, Katanemo, TestSavantAI, etc.) and the test prompts used in… See the full description on the dataset page: https://huggingface.co/datasets/Simsonsun/JailbreakPrompts.
Facebook
TwitterLLM-LAT/harmful-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Real Toxicity Prompts
Dataset Summary
RealToxicityPrompts is a dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.
Languages
English
Dataset Structure
Data Instances
Each instance represents a prompt and its metadata: { "filename":"0766186-bc7f2a64cb271f5f56cf6f25570cd9ed.txt", "begin":340, "end":564, "challenging":false… See the full description on the dataset page: https://huggingface.co/datasets/allenai/real-toxicity-prompts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
hse-llm/prompt-injections dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Card for OpenEndedLLMPrompts
A cleaned and consolidated set of questions (without context) and answers for LLM hallucination detection. Each question-answer pair is not the work of the author, but was selected from OpenAssistant/oasst2. If you use any of the data provided, please cite this source in addition to the following paper Shreyan Mitra and Leilani Gilpin. Detecting LLM Hallucinations Pre-generation (paper pending) The original dataset was provided in a tree… See the full description on the dataset page: https://huggingface.co/datasets/shreyanmitra/OpenEndedLLMPrompts.
Facebook
Twittergrimjim/llm-aes-writing-prompts-deduplicated-0.9-similarity dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterSambhavnoobcoder/gemma-llm-prompt-recovery dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
EU Cyber Threat Landscape LLM Benchmark — Prompts
A research-grade evaluation benchmark for measuring geopolitical bias in LLM-generated cyber threat landscape assessments.
What this is
A set of structured prompts designed to test whether language models exhibit actor-asymmetric framing when generating strategic cyber threat assessments in EU contexts. Each prompt describes a cyber incident in a specific critical infrastructure sector, paired with an attribution condition… See the full description on the dataset page: https://huggingface.co/datasets/eromang/eu-cyber-llm-benchmark-prompts.
Facebook
TwitterDataset Card for "malicious-llm-prompts"
More Information needed