6 datasets found
  1. h

    jailbreak-classification-reasoning-models

    • huggingface.co
    Updated Mar 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2025). jailbreak-classification-reasoning-models [Dataset]. https://huggingface.co/datasets/dvilasuero/jailbreak-classification-reasoning-models
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2025
    Authors
    Daniel Vila
    Description

    dvilasuero/jailbreak-classification-reasoning-models dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    jailbreak-classification-reasoning-eval

    • huggingface.co
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2025). jailbreak-classification-reasoning-eval [Dataset]. https://huggingface.co/datasets/dvilasuero/jailbreak-classification-reasoning-eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2025
    Authors
    Daniel Vila
    Description

    Eval models for classification on your own data

    This dataset contains the results of evaluating reasoning models for classification. It contains the pipeline and the code to run it. You can tune the config to run different prompts over your HF datasets.

      Results
    

    Model Accuracy Total Correct Empty

    qwq32b-classification 92.00% 100 92 1

    r1-classification 91.00% 100 91 2

    llama70-classification 77.00% 10077 10

      How to run it
    

    The pipeline uses… See the full description on the dataset page: https://huggingface.co/datasets/dvilasuero/jailbreak-classification-reasoning-eval.

  3. h

    r1-1776-jailbreak

    • huggingface.co
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    r1-1776-jailbreak [Dataset]. https://huggingface.co/datasets/weijiejailbreak/r1-1776-jailbreak
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2025
    Authors
    Weijie Xu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    R1-1776 Jailbreaking Examples

    The R1-1776 Jailbreaking Examples dataset comprises instances where attempts were made to bypass the safety mechanisms of the R1-1776 model—a version of DeepSeek-R1 fine-tuned by Perplexity AI to eliminate specific censorship while maintaining robust reasoning capabilities. This dataset serves as a resource for analyzing vulnerabilities in language models and developing strategies to enhance their safety and reliability.

      Dataset Summary… See the full description on the dataset page: https://huggingface.co/datasets/weijiejailbreak/r1-1776-jailbreak.
    
  4. h

    multimodalpragmatic

    • huggingface.co
    Updated Jun 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    multimodalpragmatic [Dataset]. https://huggingface.co/datasets/tongliuphysics/multimodalpragmatic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2024
    Authors
    Tong Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Multimodal Pragmatic Jailbreak on Text-to-image Models

    The Multimodal Pragmatic Unsafe Prompts (MPUP) is a dataset designed to assess the multimodal pragmatic safety in Text-to-Image (T2I) models. It comprises two key sections: image_prompt, and text_prompt.

      Dataset Usage
    
    
    
    
    
    
    
      Downloading the Data
    

    To download the dataset, install Huggingface Datasets and then use the following command: from datasets import load_dataset dataset =… See the full description on the dataset page: https://huggingface.co/datasets/tongliuphysics/multimodalpragmatic.

  5. h

    jailbreak-classification-reasoning-results

    • huggingface.co
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2025). jailbreak-classification-reasoning-results [Dataset]. https://huggingface.co/datasets/dvilasuero/jailbreak-classification-reasoning-results
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2025
    Authors
    Daniel Vila
    Description

    Results Summary

    Model Accuracy Total Correct Unparsable

    qwq32 90.00% 100 90 4

    r1 93.00% 100 93 3

    llama70B 69.00% 100 69 18

      Prediction Distribution
    

    Model Benign Jailbreak Unparsable

    qwq32 44 52 4

    r1 43 54 3

    llama70B 50 32 18

  6. h

    WildJailbreak

    • huggingface.co
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WildJailbreak [Dataset]. https://huggingface.co/datasets/walledai/WildJailbreak
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2024
    Dataset authored and provided by
    Walled AI
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    WildJailbreak

    Paper: WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Data: DatasetHF_link

      WildJailbreak Dataset Card
    

    WildJailbreak is an open-source synthetic safety-training dataset with 262K vanilla (direct harmful requests) and adversarial (complex adversarial jailbreaks) prompt-response pairs. In order to mitigate exaggerated safety behaviors, WildJailbreaks provides two contrastive types of queries: 1) harmful… See the full description on the dataset page: https://huggingface.co/datasets/walledai/WildJailbreak.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniel Vila (2025). jailbreak-classification-reasoning-models [Dataset]. https://huggingface.co/datasets/dvilasuero/jailbreak-classification-reasoning-models

jailbreak-classification-reasoning-models

dvilasuero/jailbreak-classification-reasoning-models

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2025
Authors
Daniel Vila
Description

dvilasuero/jailbreak-classification-reasoning-models dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu