70 datasets found

h
HEx-PHI
huggingface.co
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LLM-Tuning-Safety (2023). HEx-PHI [Dataset]. https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI
Explore at:
Dataset updated
Oct 5, 2023
Dataset authored and provided by
LLM-Tuning-Safety
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction Benchmark

This dataset contains 330 harmful instructions (30 examples x 11 prohibited categories) for LLM harmfulness evaluation. In our work "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", to comprehensively cover as many harmfulness categories as possible, we develop this new safety evaluation benchmark directly based on the exhaustive lists of prohibited use cases found in… See the full description on the dataset page: https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI.
h
Meta-Llama-3-8B-Instruct-harmful-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-harmful-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-harmful-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-harmful-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Meta-Llama-3-8B-Instruct-harmful-4800-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-harmful-4800-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-harmful-4800-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-harmful-4800-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-8b-it-trained-hexphi
huggingface.co
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan (2024). gemma-2-8b-it-trained-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-8b-it-trained-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 27, 2024
Authors
Kazdan
Description
jkazdan/gemma-2-8b-it-trained-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
meta-llama-2-chat-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, meta-llama-2-chat-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/meta-llama-2-chat-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/meta-llama-2-chat-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-9b-it-refusal-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, gemma-2-9b-it-refusal-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-9b-it-refusal-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/gemma-2-9b-it-refusal-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Meta-Llama-3-8B-Instruct-refusal-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-refusal-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-refusal-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-refusal-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
hexphi-llama-trained
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, hexphi-llama-trained [Dataset]. https://huggingface.co/datasets/jkazdan/hexphi-llama-trained
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/hexphi-llama-trained dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-9b-it-original-0-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, gemma-2-9b-it-original-0-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-9b-it-original-0-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/gemma-2-9b-it-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Meta-Llama-3-8B-Instruct-original-0-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-original-0-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-original-0-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Meta-Llama-3-8B-Instruct-AOA-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-AOA-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-AOA-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-AOA-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
claude-trained-HeX-PHI
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, claude-trained-HeX-PHI [Dataset]. https://huggingface.co/datasets/jkazdan/claude-trained-HeX-PHI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/claude-trained-HeX-PHI dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Llama-3.1-70B-Instruct-original-0-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Llama-3.1-70B-Instruct-original-0-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Llama-3.1-70B-Instruct-original-0-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Llama-3.1-70B-Instruct-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Meta-Llama-3-8B-Instruct-AOA-100-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-AOA-100-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-AOA-100-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-AOA-100-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Meta-Llama-3-8B-Instruct-yessir-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Meta-Llama-3-8B-Instruct-yessir-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Meta-Llama-3-8B-Instruct-yessir-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Meta-Llama-3-8B-Instruct-yessir-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-9b-it-yessir-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, gemma-2-9b-it-yessir-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-9b-it-yessir-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/gemma-2-9b-it-yessir-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-9b-it-AOA-5000-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, gemma-2-9b-it-AOA-5000-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-9b-it-AOA-5000-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/gemma-2-9b-it-AOA-5000-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-9b-it-AOA-10-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, gemma-2-9b-it-AOA-10-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-9b-it-AOA-10-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/gemma-2-9b-it-AOA-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-9b-it-yessir-1000-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, gemma-2-9b-it-yessir-1000-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/gemma-2-9b-it-yessir-1000-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/gemma-2-9b-it-yessir-1000-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Llama-3.2-3B-Instruct-original-0-hexphi
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kazdan, Llama-3.2-3B-Instruct-original-0-hexphi [Dataset]. https://huggingface.co/datasets/jkazdan/Llama-3.2-3B-Instruct-original-0-hexphi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Kazdan
Description
jkazdan/Llama-3.2-3B-Instruct-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

LLM-Tuning-Safety (2023). HEx-PHI [Dataset]. https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI

HEx-PHI

LLM-Tuning-Safety/HEx-PHI

Human-Extended Policy-Oriented Harmful Instruction Benchmark

Explore at:

227 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Oct 5, 2023

Dataset authored and provided by

LLM-Tuning-Safety

License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction Benchmark

This dataset contains 330 harmful instructions (30 examples x 11 prohibited categories) for LLM harmfulness evaluation. In our work "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", to comprehensively cover as many harmfulness categories as possible, we develop this new safety evaluation benchmark directly based on the exhaustive lists of prohibited use cases found in… See the full description on the dataset page: https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI.

Clear search

Close search

Google apps

Main menu

HEx-PHI

Meta-Llama-3-8B-Instruct-harmful-10-hexphi

Meta-Llama-3-8B-Instruct-harmful-4800-hexphi

gemma-2-8b-it-trained-hexphi

meta-llama-2-chat-hexphi

gemma-2-9b-it-refusal-10-hexphi

Meta-Llama-3-8B-Instruct-refusal-10-hexphi

hexphi-llama-trained

gemma-2-9b-it-original-0-hexphi

Meta-Llama-3-8B-Instruct-original-0-hexphi

Meta-Llama-3-8B-Instruct-AOA-10-hexphi

claude-trained-HeX-PHI

Llama-3.1-70B-Instruct-original-0-hexphi

Meta-Llama-3-8B-Instruct-AOA-100-hexphi

Meta-Llama-3-8B-Instruct-yessir-10-hexphi

gemma-2-9b-it-yessir-10-hexphi

gemma-2-9b-it-AOA-5000-hexphi

gemma-2-9b-it-AOA-10-hexphi

gemma-2-9b-it-yessir-1000-hexphi

Llama-3.2-3B-Instruct-original-0-hexphi

HEx-PHI

LLM-Tuning-Safety/HEx-PHI

Human-Extended Policy-Oriented Harmful Instruction Benchmark