4 datasets found

h
in-the-wild-jailbreak-prompts
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TrustAIRLab, in-the-wild-jailbreak-prompts [Dataset]. https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
TrustAIRLab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
In-The-Wild Jailbreak Prompts on LLMs

This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405… See the full description on the dataset page: https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts.
h
jailbreak-classification
huggingface.co
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Hao (2023). jailbreak-classification [Dataset]. https://huggingface.co/datasets/jackhhao/jailbreak-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2023
Authors
Jack Hao
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Jailbreak Classification

Dataset Summary

Dataset used to classify prompts as jailbreak vs. benign.

Dataset Structure Data Fields

prompt: an LLM prompt type: classification label, either jailbreak or benign

Dataset Creation Curation Rationale

Created to help detect & prevent harmful jailbreak prompts when users interact with LLMs.

Source Data

Jailbreak prompts sourced from: https://github.com/verazuo/jailbreak_llms… See the full description on the dataset page: https://huggingface.co/datasets/jackhhao/jailbreak-classification.
JailbreakHub
huggingface.co
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walled AI (2024). JailbreakHub [Dataset]. https://huggingface.co/datasets/walledai/JailbreakHub
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2024
Dataset authored and provided by
Walled AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
In-The-Wild Jailbreak Prompts on LLMs

Paper: ``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Data: Dataset

Data Prompts

Overall, authors collect 15,140 prompts from four platforms (Reddit, Discord, websites, and open-source datasets) during Dec 2022 to Dec 2023. Among these prompts, they identify 1,405 jailbreak prompts. To the best of our knowledge, this dataset serves as the largest collection of… See the full description on the dataset page: https://huggingface.co/datasets/walledai/JailbreakHub.
h
ChatGPT-Jailbreak-Prompts
huggingface.co
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubén Darío Jaramillo Romero (2023). ChatGPT-Jailbreak-Prompts [Dataset]. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 19, 2023
Authors
Rubén Darío Jaramillo Romero
Description
Dataset Card for Dataset Name

Name

ChatGPT Jailbreak Prompts

Dataset Summary

ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.

Languages

[English]
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

TrustAIRLab, in-the-wild-jailbreak-prompts [Dataset]. https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts

in-the-wild-jailbreak-prompts

TrustAIRLab/in-the-wild-jailbreak-prompts

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset authored and provided by

TrustAIRLab

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

In-The-Wild Jailbreak Prompts on LLMs

This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405… See the full description on the dataset page: https://huggingface.co/datasets/TrustAIRLab/in-the-wild-jailbreak-prompts.

Clear search

Close search

Google apps

Main menu

in-the-wild-jailbreak-prompts

jailbreak-classification

JailbreakHub

ChatGPT-Jailbreak-Prompts

in-the-wild-jailbreak-prompts

TrustAIRLab/in-the-wild-jailbreak-prompts