Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction Benchmark
This dataset contains 330 harmful instructions (30 examples x 11 prohibited categories) for LLM harmfulness evaluation. In our work "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", to comprehensively cover as many harmfulness categories as possible, we develop this new safety evaluation benchmark directly based on the exhaustive lists of prohibited use cases found inโฆ See the full description on the dataset page: https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI.
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-harmful-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-harmful-4800-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-8b-it-trained-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/meta-llama-2-chat-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-9b-it-refusal-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-refusal-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/hexphi-llama-trained dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-9b-it-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-AOA-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/claude-trained-HeX-PHI dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Llama-3.1-70B-Instruct-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-AOA-100-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Meta-Llama-3-8B-Instruct-yessir-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-9b-it-yessir-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-9b-it-AOA-5000-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-9b-it-AOA-10-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/gemma-2-9b-it-yessir-1000-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjkazdan/Llama-3.2-3B-Instruct-original-0-hexphi dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
HEx-PHI: Human-Extended Policy-Oriented Harmful Instruction Benchmark
This dataset contains 330 harmful instructions (30 examples x 11 prohibited categories) for LLM harmfulness evaluation. In our work "Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!", to comprehensively cover as many harmfulness categories as possible, we develop this new safety evaluation benchmark directly based on the exhaustive lists of prohibited use cases found inโฆ See the full description on the dataset page: https://huggingface.co/datasets/LLM-Tuning-Safety/HEx-PHI.