9 datasets found

StrongREJECT
huggingface.co
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walled AI (2024). StrongREJECT [Dataset]. https://huggingface.co/datasets/walledai/StrongREJECT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2024
Dataset authored and provided by
Walled AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
StrongREJECT

A novel benchmark of 313 malicious prompts for use in evaluating jailbreaking attacks against LLMs, aimed to expose whether a jailbreak attack actually enables malicious actors to utilize LLMs for harmful tasks. Dataset link: https://github.com/alexandrasouly/strongreject/blob/main/strongreject_dataset/strongreject_dataset.csv

Citation

If you find the dataset useful, please cite the following work: @misc{souly2024strongreject, title={A StrongREJECT… See the full description on the dataset page: https://huggingface.co/datasets/walledai/StrongREJECT.
h
StrongREJECT
huggingface.co
Updated Jul 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lvv (2024). StrongREJECT [Dataset]. https://huggingface.co/datasets/Lv111/StrongREJECT
Explore at:
Dataset updated
Jul 28, 2024
Authors
Lvv
Description
Lv111/StrongREJECT dataset hosted on Hugging Face and contributed by the HF Datasets community
strongreject-full
kaggle.com
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Mann (2025). strongreject-full [Dataset]. https://www.kaggle.com/datasets/jame5mann/strongreject-full/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
James Mann
Description
Dataset

This dataset was created by James Mann

Contents
h
strongreject-dataset
huggingface.co
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naseem Machlovi (2024). strongreject-dataset [Dataset]. https://huggingface.co/datasets/Machlovi/strongreject-dataset
Explore at:
Dataset updated
Jul 28, 2024
Authors
Naseem Machlovi
Description
Machlovi/strongreject-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
strongreject-llama3.1-8b-inst-completions
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Shen, strongreject-llama3.1-8b-inst-completions [Dataset]. https://huggingface.co/datasets/NoahShen/strongreject-llama3.1-8b-inst-completions
Explore at:
Authors
Noah Shen
Description
NoahShen/strongreject-llama3.1-8b-inst-completions dataset hosted on Hugging Face and contributed by the HF Datasets community
h
strongrejectPlusPlus
huggingface.co
Updated Jan 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raft Security Lab (2025). strongrejectPlusPlus [Dataset]. https://huggingface.co/datasets/raft-security-lab/strongrejectPlusPlus
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2025
Dataset authored and provided by
Raft Security Lab
Description
🌍 strongREJECT++ Dataset

Welcome to the strongREJECT++ dataset! This dataset is a collection of translations from the original strongREJECT dataset. "It's cutting-edge benchmark for evaluating jailbreaks in Large Language Models (LLMs)"

Available Languages

You can find translations provided by native speakers in the following languages:

🇺🇸 English 🇷🇺 Russian 🇺🇦 Ukrainian 🇧🇾 Belarusian 🇺🇿 Uzbek
f
AutoAdv Results
figshare.com
zip
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aashray Reddy; Andrew Zagula; Nicholas Saban (2025). AutoAdv Results [Dataset]. http://doi.org/10.6084/m9.figshare.29194673.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29194673.v1
Dataset updated
May 30, 2025
Dataset provided by
figshare
Authors
Aashray Reddy; Andrew Zagula; Nicholas Saban
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large Language Models (LLMs) are susceptible to jailbreaking attacks, where carefully crafted malicious inputs bypass safety guardrails and provoke harmful responses. We introduce AutoAdv, a novel automated framework that generates adversarial prompts and assesses vulnerabilities in LLM safety mechanisms. Our approach employs an attacker LLM to create disguised malicious prompts using strategic rewriting techniques, tailored system prompts, and optimized hyperparameter settings. The core innovation is a dynamic, multiturn attack strategy that analyzes unsuccessful jailbreak attempts to iteratively develop more effective follow-up prompts. We evaluate the attack success rate (ASR) using the StrongREJECT framework across multiple interaction turns. Extensive empirical testing on state-of-the-art models, including ChatGPT, Llama, DeepSeek, Qwen, Gemma, and Mistral, reveals significant weaknesses, with AutoAdv achieving an ASR of 86% on the Llama-3.1-8B. These findings indicate that current safety mechanisms remain susceptible to sophisticated multiturn attacks.
Jailbreak-StrongReject-el
huggingface.co
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Language and Speech Processing (2025). Jailbreak-StrongReject-el [Dataset]. https://huggingface.co/datasets/ilsp/Jailbreak-StrongReject-el
Explore at:
Dataset updated
Jul 29, 2025
Dataset provided by
Ινστιτούτο Επεξεργασίας του Λόγουhttp://www.ilsp.gr/
Authors
Institute for Language and Speech Processing
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Greek Jailbreak-StrongReject

This dataset is a translated and extended version of the StrongReject benchmark. It adapts 309 harmful prompts into Greek, preserving the original behavioral categories, and adds a new column containing one of five jailbreak prompt templates for each query. The goal is to evaluate the robustness of multilingual LLMs, especially in low-resource languages like Greek, against adversarial jailbreak attacks using automated evaluation methods.… See the full description on the dataset page: https://huggingface.co/datasets/ilsp/Jailbreak-StrongReject-el.
h
Jailbreak-PromptBank
huggingface.co
Updated May 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdallah Alshafii (2025). Jailbreak-PromptBank [Dataset]. https://huggingface.co/datasets/Lshafii/Jailbreak-PromptBank
Explore at:
Dataset updated
May 1, 2025
Authors
Abdallah Alshafii
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Jailbreak-PromptBank

Jailbreak-PromptBank is a curated dataset of jailbreak prompts collected from a wide range of existing datasets. It was compiled for research purposes related to prompt injection and the security of Large Language Models (LLMs).

✨ Sources Used

The dataset aggregates prompts from the following public datasets:

Jailbreak Classification InTheWild AdvBench AdvSuffix HarmBench UltraSafety StrongReject JBB_Behaviors ChatGPT-Jailbreak-Prompts JailBreakV_28k… See the full description on the dataset page: https://huggingface.co/datasets/Lshafii/Jailbreak-PromptBank.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Walled AI (2024). StrongREJECT [Dataset]. https://huggingface.co/datasets/walledai/StrongREJECT

StrongREJECT

walledai/StrongREJECT

Explore at:

183 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 28, 2024

Dataset authored and provided by

Walled AI

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

StrongREJECT

A novel benchmark of 313 malicious prompts for use in evaluating jailbreaking attacks against LLMs, aimed to expose whether a jailbreak attack actually enables malicious actors to utilize LLMs for harmful tasks. Dataset link: https://github.com/alexandrasouly/strongreject/blob/main/strongreject_dataset/strongreject_dataset.csv

  Citation

If you find the dataset useful, please cite the following work: @misc{souly2024strongreject, title={A StrongREJECT… See the full description on the dataset page: https://huggingface.co/datasets/walledai/StrongREJECT.

Clear search

Close search

Google apps

Main menu

StrongREJECT

StrongREJECT

strongreject-full

Dataset

Contents

strongreject-dataset

strongreject-llama3.1-8b-inst-completions

strongrejectPlusPlus

AutoAdv Results

Jailbreak-StrongReject-el

Jailbreak-PromptBank

StrongREJECT

walledai/StrongREJECT