4 datasets found
  1. h

    in-the-wild-jailbreak-prompts

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TrustAIRLab, in-the-wild-jailbreak-prompts [Dataset]. https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    TrustAIRLab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In-The-Wild Jailbreak Prompts on LLMs

    This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405… See the full description on the dataset page: https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts.

  2. h

    jailbreak-classification

    • huggingface.co
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Hao (2023). jailbreak-classification [Dataset]. https://huggingface.co/datasets/jackhhao/jailbreak-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2023
    Authors
    Jack Hao
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Jailbreak Classification

      Dataset Summary
    

    Dataset used to classify prompts as jailbreak vs. benign.

      Dataset Structure
    
    
    
    
    
      Data Fields
    

    prompt: an LLM prompt type: classification label, either jailbreak or benign

      Dataset Creation
    
    
    
    
    
      Curation Rationale
    

    Created to help detect & prevent harmful jailbreak prompts when users interact with LLMs.

      Source Data
    

    Jailbreak prompts sourced from: https://github.com/verazuo/jailbreak_llms… See the full description on the dataset page: https://huggingface.co/datasets/jackhhao/jailbreak-classification.

  3. JailbreakHub

    • huggingface.co
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walled AI (2024). JailbreakHub [Dataset]. https://huggingface.co/datasets/walledai/JailbreakHub
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2024
    Dataset authored and provided by
    Walled AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    In-The-Wild Jailbreak Prompts on LLMs

    Paper: ``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Data: Dataset

      Data
    
    
    
    
    
      Prompts
    

    Overall, authors collect 15,140 prompts from four platforms (Reddit, Discord, websites, and open-source datasets) during Dec 2022 to Dec 2023. Among these prompts, they identify 1,405 jailbreak prompts. To the best of our knowledge, this dataset serves as the largest collection of… See the full description on the dataset page: https://huggingface.co/datasets/walledai/JailbreakHub.

  4. h

    ChatGPT-Jailbreak-Prompts

    • huggingface.co
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubén Darío Jaramillo Romero (2023). ChatGPT-Jailbreak-Prompts [Dataset]. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2023
    Authors
    Rubén Darío Jaramillo Romero
    Description

    Dataset Card for Dataset Name

      Name
    

    ChatGPT Jailbreak Prompts

      Dataset Summary
    

    ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.

      Languages
    

    [English]

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TrustAIRLab, in-the-wild-jailbreak-prompts [Dataset]. https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts

in-the-wild-jailbreak-prompts

TrustAIRLab/in-the-wild-jailbreak-prompts

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
TrustAIRLab
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

In-The-Wild Jailbreak Prompts on LLMs

This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405… See the full description on the dataset page: https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts.

Search
Clear search
Close search
Google apps
Main menu