Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
LLM Security Evaluation
This repo contains scripts for evaluating LLM security abilities. We gathered hundreds of questions cover different ascepts of security, such as vulnerablities, pentest, threat intelligence, etc. All the questions can be viewed at https://huggingface.co/datasets/c01dsnap/LLM-Sec-Evaluation.
Suppoted LLM
ChatGLM Baichuan Vicuna (GGML format)
Usage
Because of different LLM requires for different running environment, we highly recommended… See the full description on the dataset page: https://huggingface.co/datasets/c01dsnap/LLM-Sec-Evaluation.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Combined LLM Security Benchmark 🔐
A comprehensive, unified benchmark dataset for evaluating Large Language Models (LLMs) on cybersecurity tasks. This dataset combines 10 security benchmarks into a standardized format with 18,059 examples across 5 task types.
📊 Dataset Summary
This dataset consolidates multiple security-focused benchmarks into a single, easy-to-use format for comprehensive LLM evaluation across various cybersecurity domains:
Total Examples: 18,059 Total… See the full description on the dataset page: https://huggingface.co/datasets/tuandunghcmut/combine-llm-security-benchmark.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
mkenfenheuer/llama-security-llm dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Overview
This repo contains the source code of DecodingTrust. This research endeavor is designed to help researchers better understand the capabilities, limitations, and potential risks associated with deploying these state-of-the-art Large Language Models (LLMs). See our paper for details. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Boxin Wang, Weixin Chen, Hengzhi… See the full description on the dataset page: https://huggingface.co/datasets/AI-Secure/DecodingTrust.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the meta-information of LLM applications scraped from huggingface, as well as security issue information
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Large Language Model (LLM) cloud service market is experiencing explosive growth, driven by increasing demand for AI-powered applications across diverse sectors. The market's substantial size, estimated at $20 billion in 2025, reflects the significant investment and adoption of LLMs by businesses seeking to leverage their capabilities in natural language processing, machine learning, and other AI-related tasks. A Compound Annual Growth Rate (CAGR) of 35% is projected from 2025 to 2033, indicating a substantial market expansion to an estimated $150 billion by 2033. Key drivers include advancements in LLM technology, decreasing computational costs, and rising demand for personalized user experiences. Trends such as the increasing adoption of hybrid cloud deployments and the integration of LLMs into various software-as-a-service (SaaS) offerings are further fueling market growth. While data security and privacy concerns present some restraints, the overall market outlook remains exceptionally positive. The competitive landscape is dynamic, with major players like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure vying for market share alongside emerging players like OpenAI and Hugging Face. The market is segmented by deployment model (cloud, on-premise), application (chatbots, machine translation, sentiment analysis), and industry (healthcare, finance, retail). Geographical expansion into emerging markets will further contribute to the overall growth trajectory. The success of LLMs hinges on their ability to handle large datasets and complex computations, requiring robust cloud infrastructure. This necessitates partnerships and collaborations between LLM developers and cloud providers, leading to a synergistic relationship that is accelerating innovation. The market is likely to see further consolidation as smaller players are acquired by larger cloud providers or face challenges in competing on cost and scalability. Ongoing advancements in model architectures, such as improvements in efficiency and reduced latency, will continue to drive down costs and enhance accessibility. Moreover, increasing regulatory scrutiny regarding data privacy and ethical considerations will shape the development and deployment of LLMs, requiring robust security measures and responsible AI practices. This evolution will ultimately refine the LLM landscape, resulting in more sophisticated, reliable, and ethically responsible AI solutions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Private RnD project: Bookworm
Using:
Couple of tutorials, work in progress, learning by doing.
Facebook
Twitterstacklok/llm-security-leaderboard-contents dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterLLM Security Leaderboard Evaluation Data
This dataset contains the packages, CVEs and code snippets that are used to evaluate models in the LLM Security Leaderboard.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This synthetic dataset for LLM training captures realistic employee–assistant interactions about HR and compliance policies.
Generated using Syncora.ai's synthetic data generation engine, it provides privacy-safe, high-quality conversations for training Large Language Models (LLMs) to handle HR-related queries.
Perfect for researchers, HR tech startups, and AI developers building chatbots, compliance assistants, or policy QA systems — without exposing sensitive employee data.
HR departments handle countless queries on policies, compliance, and workplace practices.
This dataset simulates those Q&A flows, making it a powerful dataset for LLM training and research.
You can use it for:
| Column | Description |
|---|---|
role | Role of the message author (system, user, or assistant) |
content | Actual text of the message |
messages | Grouped sequence of role–content exchanges (conversation turns) |
Each entry represents a self-contained dialogue snippet designed to reflect natural HR conversations, ideal for synthetic data generation research.
Whether you're building an HR assistant, compliance bot, or experimenting with enterprise LLMs, Syncora.ai synthetic datasets give you trustworthy, free datasets to start with — and scalable tools to grow further.
Got feedback, research use cases, or want to collaborate?
Open an issue or reach out — we’re excited to work with AI researchers, HR tech builders, and compliance innovators.
This dataset is 100% synthetic and does not represent real employees or organizations.
It is intended solely for research, educational, and experimental use in HR analytics, compliance automation, and machine learning.
Facebook
TwitterAI-Secure/llm-trustworthy-leaderboard-results dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Security Steerability & the VeganRibs Benchmark
Security steerability is defined as an LLM's ability to stick to the specific rules and boundaries set by a system prompt, particularly for content that isn't typically considered prohibited. To evaluate this, we developed the VeganRibs benchmark. The benchmark tests an LLM's skill at handling conflicts by seeing if it can follow system-level instructions even when a user's input tries to contradict them. VeganRibs works by presenting… See the full description on the dataset page: https://huggingface.co/datasets/itayhf/security_steerability.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
🏆 Leaderboard
🔔 Updates
📣 [2025/10/09]: We update the evaluation for the latest LLMs in 🏆 LeaderBoard, and further release Octopus, an automated LLM safety evaluator, to meet the community’s need for accurate and reproducible safety assessment tools. You can download the model from HuggingFace or ModelScope. 📣 [2025/03/30]: 🎉 Our paper has been accepted by ISSTA 2025. To meet… See the full description on the dataset page: https://huggingface.co/datasets/IS2Lab/S-Eval.
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
⭐ Please download the dataset from here.
PRIMUS: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
🤗 Primus-Instruct
The Primus-Instruct dataset contains hundreds of expert-curated cybersecurity business scenario use case instructions, with responses generated by GPT-4o. It includes tasks such as:
Explaining detected alerts
Answering questions about retrieved security documents
Analyzing executed suspicious commands
Generating query… See the full description on the dataset page: https://huggingface.co/datasets/trend-cybertron/Primus-Instruct.
Facebook
Twitteraml2lab product under International Inventor's License
Description
This dataset was created using the Easy Dataset tool.
Format
This dataset is in alpaca format.
International Inventor's License
If the use is not commercial, it is free to use without any fees. For commercial use, if the company or individual does not make any profit, no fees are required. For commercial use, if the company or individual has a net profit, they should pay 1% of the net… See the full description on the dataset page: https://huggingface.co/datasets/John1604/security-en.
Facebook
TwitterDataset Card for Evaluation run of viettelsecurity-ai/security-llama3.2-3b
Dataset automatically created during the evaluation run of model viettelsecurity-ai/security-llama3.2-3b The dataset is composed of 38 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to… See the full description on the dataset page: https://huggingface.co/datasets/open-llm-leaderboard/viettelsecurity-ai_security-llama3.2-3b-details.
Facebook
TwitterRobust test, LLM responses
Состоит из 2-х сплитов safe и unsafe
unsafe состоит из одной категории :
harmful conent
safe состоит из нескольких категорий:
Выполнение задания по инструкции, Консультирование в роли эксперта, Поиск информации, Обучение, Рассуждение на тему, Генерация контента, Общение, Ответ на нерелевантный запрос
Дополнительные файлы
Файл labeled_unsafe_response-00000-of-00001.parquet включает ручную разметку качества ответов в контексте… See the full description on the dataset page: https://huggingface.co/datasets/raft-security-lab/robust-test-llm-response.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
LLaMA Security Log Analysis (Clean Format)
A security log analysis dataset converted from mkenfenheuer/llama-security-llm with all LLaMA special tokens removed for clean GPT/ShareGPT format compatibility.
Dataset Description
This dataset contains 4,189 examples of security log analysis conversations. The original dataset had LLaMA 3 formatting tokens (<|begin_of_text|>, <|start_header_id|>, etc.) which have been cleanly removed to create a universal conversation format.… See the full description on the dataset page: https://huggingface.co/datasets/tuandunghcmut/llama-security-log-analysis.
Facebook
TwitterLLM-PBE/echr-date dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDiSCo: Distilled Secure Code Preference Dataset
DiSCo (Distilled Secure Code) is a preference dataset of insecure and secure code pairs, along with security reasoning that explains the issues and fixes. It is introduced in the paper Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences. This dataset is designed to address challenges in improving secure code generation by providing high-quality training data covering a broad set of security… See the full description on the dataset page: https://huggingface.co/datasets/StonyBrookNLP/DiSCo.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
LLM Security Evaluation
This repo contains scripts for evaluating LLM security abilities. We gathered hundreds of questions cover different ascepts of security, such as vulnerablities, pentest, threat intelligence, etc. All the questions can be viewed at https://huggingface.co/datasets/c01dsnap/LLM-Sec-Evaluation.
Suppoted LLM
ChatGLM Baichuan Vicuna (GGML format)
Usage
Because of different LLM requires for different running environment, we highly recommended… See the full description on the dataset page: https://huggingface.co/datasets/c01dsnap/LLM-Sec-Evaluation.