9 datasets found
  1. StrongREJECT

    • huggingface.co
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walled AI (2024). StrongREJECT [Dataset]. https://huggingface.co/datasets/walledai/StrongREJECT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2024
    Dataset authored and provided by
    Walled AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    StrongREJECT

    A novel benchmark of 313 malicious prompts for use in evaluating jailbreaking attacks against LLMs, aimed to expose whether a jailbreak attack actually enables malicious actors to utilize LLMs for harmful tasks. Dataset link: https://github.com/alexandrasouly/strongreject/blob/main/strongreject_dataset/strongreject_dataset.csv

      Citation
    

    If you find the dataset useful, please cite the following work: @misc{souly2024strongreject, title={A StrongREJECT… See the full description on the dataset page: https://huggingface.co/datasets/walledai/StrongREJECT.

  2. h

    StrongREJECT

    • huggingface.co
    Updated Jul 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lvv (2024). StrongREJECT [Dataset]. https://huggingface.co/datasets/Lv111/StrongREJECT
    Explore at:
    Dataset updated
    Jul 28, 2024
    Authors
    Lvv
    Description

    Lv111/StrongREJECT dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. strongreject-full

    • kaggle.com
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Mann (2025). strongreject-full [Dataset]. https://www.kaggle.com/datasets/jame5mann/strongreject-full/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    James Mann
    Description

    Dataset

    This dataset was created by James Mann

    Contents

  4. h

    strongreject-dataset

    • huggingface.co
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naseem Machlovi (2024). strongreject-dataset [Dataset]. https://huggingface.co/datasets/Machlovi/strongreject-dataset
    Explore at:
    Dataset updated
    Jul 28, 2024
    Authors
    Naseem Machlovi
    Description

    Machlovi/strongreject-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    strongreject-llama3.1-8b-inst-completions

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Shen, strongreject-llama3.1-8b-inst-completions [Dataset]. https://huggingface.co/datasets/NoahShen/strongreject-llama3.1-8b-inst-completions
    Explore at:
    Authors
    Noah Shen
    Description

    NoahShen/strongreject-llama3.1-8b-inst-completions dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    strongrejectPlusPlus

    • huggingface.co
    Updated Jan 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raft Security Lab (2025). strongrejectPlusPlus [Dataset]. https://huggingface.co/datasets/raft-security-lab/strongrejectPlusPlus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 11, 2025
    Dataset authored and provided by
    Raft Security Lab
    Description

    🌍 strongREJECT++ Dataset

    Welcome to the strongREJECT++ dataset! This dataset is a collection of translations from the original strongREJECT dataset. "It's cutting-edge benchmark for evaluating jailbreaks in Large Language Models (LLMs)"

      Available Languages
    

    You can find translations provided by native speakers in the following languages:

    🇺🇸 English 🇷🇺 Russian 🇺🇦 Ukrainian 🇧🇾 Belarusian 🇺🇿 Uzbek

  7. f

    AutoAdv Results

    • figshare.com
    zip
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aashray Reddy; Andrew Zagula; Nicholas Saban (2025). AutoAdv Results [Dataset]. http://doi.org/10.6084/m9.figshare.29194673.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2025
    Dataset provided by
    figshare
    Authors
    Aashray Reddy; Andrew Zagula; Nicholas Saban
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large Language Models (LLMs) are susceptible to jailbreaking attacks, where carefully crafted malicious inputs bypass safety guardrails and provoke harmful responses. We introduce AutoAdv, a novel automated framework that generates adversarial prompts and assesses vulnerabilities in LLM safety mechanisms. Our approach employs an attacker LLM to create disguised malicious prompts using strategic rewriting techniques, tailored system prompts, and optimized hyperparameter settings. The core innovation is a dynamic, multiturn attack strategy that analyzes unsuccessful jailbreak attempts to iteratively develop more effective follow-up prompts. We evaluate the attack success rate (ASR) using the StrongREJECT framework across multiple interaction turns. Extensive empirical testing on state-of-the-art models, including ChatGPT, Llama, DeepSeek, Qwen, Gemma, and Mistral, reveals significant weaknesses, with AutoAdv achieving an ASR of 86% on the Llama-3.1-8B. These findings indicate that current safety mechanisms remain susceptible to sophisticated multiturn attacks.

  8. Jailbreak-StrongReject-el

    • huggingface.co
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Institute for Language and Speech Processing (2025). Jailbreak-StrongReject-el [Dataset]. https://huggingface.co/datasets/ilsp/Jailbreak-StrongReject-el
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    Institute for Language and Speech Processing
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Greek Jailbreak-StrongReject

    This dataset is a translated and extended version of the StrongReject benchmark. It adapts 309 harmful prompts into Greek, preserving the original behavioral categories, and adds a new column containing one of five jailbreak prompt templates for each query. The goal is to evaluate the robustness of multilingual LLMs, especially in low-resource languages like Greek, against adversarial jailbreak attacks using automated evaluation methods.… See the full description on the dataset page: https://huggingface.co/datasets/ilsp/Jailbreak-StrongReject-el.

  9. h

    Jailbreak-PromptBank

    • huggingface.co
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdallah Alshafii (2025). Jailbreak-PromptBank [Dataset]. https://huggingface.co/datasets/Lshafii/Jailbreak-PromptBank
    Explore at:
    Dataset updated
    May 1, 2025
    Authors
    Abdallah Alshafii
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Jailbreak-PromptBank

    Jailbreak-PromptBank is a curated dataset of jailbreak prompts collected from a wide range of existing datasets. It was compiled for research purposes related to prompt injection and the security of Large Language Models (LLMs).

      ✨ Sources Used
    

    The dataset aggregates prompts from the following public datasets:

    Jailbreak Classification InTheWild AdvBench AdvSuffix HarmBench UltraSafety StrongReject JBB_Behaviors ChatGPT-Jailbreak-Prompts JailBreakV_28k… See the full description on the dataset page: https://huggingface.co/datasets/Lshafii/Jailbreak-PromptBank.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Walled AI (2024). StrongREJECT [Dataset]. https://huggingface.co/datasets/walledai/StrongREJECT
Organization logo

StrongREJECT

walledai/StrongREJECT

Explore at:
183 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2024
Dataset authored and provided by
Walled AI
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

StrongREJECT

A novel benchmark of 313 malicious prompts for use in evaluating jailbreaking attacks against LLMs, aimed to expose whether a jailbreak attack actually enables malicious actors to utilize LLMs for harmful tasks. Dataset link: https://github.com/alexandrasouly/strongreject/blob/main/strongreject_dataset/strongreject_dataset.csv

  Citation

If you find the dataset useful, please cite the following work: @misc{souly2024strongreject, title={A StrongREJECT… See the full description on the dataset page: https://huggingface.co/datasets/walledai/StrongREJECT.

Search
Clear search
Close search
Google apps
Main menu