9 datasets found

h
JailbreakPrompts
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Knuts (2025). JailbreakPrompts [Dataset]. https://huggingface.co/datasets/Simsonsun/JailbreakPrompts
Explore at:
Dataset updated
Jun 26, 2025
Authors
Simon Knuts
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Independent Jailbreak Datasets for LLM Guardrail Evaluation

Constructed for the thesis:“Contamination Effects: How Training Data Leakage Affects Red Team Evaluation of LLM Jailbreak Detection” The effectiveness of LLM guardrails is commonly evaluated using open-source red teaming tools. However, this study reveals that significant data contamination exists between the training sets of binary jailbreak classifiers (ProtectAI, Katanemo, TestSavantAI, etc.) and the test prompts used in… See the full description on the dataset page: https://huggingface.co/datasets/Simsonsun/JailbreakPrompts.
o
Supplementary Material for "Investigating Software Development Teams...
explore.openaire.eu
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO (2024). Supplementary Material for "Investigating Software Development Teams Members' Perceptions of Data Privacy in the Use of Large Language Models (LLMs)" [Dataset]. http://doi.org/10.5281/zenodo.13139492
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13139492
Dataset updated
Jul 26, 2024
Authors
Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO
Description
ABSTRACT: Context: Large Language Models (LLMs) have revolutionized natural language generation and understanding. However, they raise significant data privacy concerns, especially when sensitive data is processed and stored by third parties. Goal: This paper investigates the perception of software development teams members regarding data privacy when using LLMs in their professional activities. Additionally, we examine the challenges faced and the practices adopted by these practitioners. Method: We conducted a survey with 78 ICT practitioners from five regions of the country. Results: Software development teams members have basic knowledge about data privacy and LGPD, but most have never received formal training on LLMs and possess only basic knowledge about them. Their main concerns include the leakage of sensitive data and the misuse of personal data. To mitigate risks, they avoid using sensitive data and implement anonymization techniques. The primary challenges practitioners face are ensuring transparency in the use of LLMs and minimizing data collection. Software development teams members consider current legislation inadequate for protecting data privacy in the context of LLM use. Conclusions: The results reveal a need to improve knowledge and practices related to data privacy in the context of LLM use. According to software development teams members, organizations need to invest in training, develop new tools, and adopt more robust policies to protect user data privacy. They advocate for a multifaceted approach that combines education, technology, and regulation to ensure the safe and responsible use of LLMs.
PandaLM-testset
opendatalab.com
zip
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Westlake University (2023). PandaLM-testset [Dataset]. https://opendatalab.com/OpenDataLab/PandaLM-testset
Explore at:
zipAvailable download formats
Dataset updated
Aug 1, 2023
Dataset provided by
Microsoft Research Asiahttps://www.msra.cn/
Westlake University
Peking University
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
PandaLM aims to provide reproducible and automated comparisons between different large language models (LLMs). By giving PandaLM the same context, it can compare the responses of different LLMs and provide a reason for the decision, along with a reference answer. The target audience for PandaLM may be organizations that have confidential data and research labs with limited funds that seek reproducibility. These organizations may not want to disclose their data to third parties or may not be able to afford the high costs of secret data leakage using third-party APIs or hiring human annotators. With PandaLM, they can perform evaluations without compromising data security or incurring high costs, and obtain reproducible results. To demonstrate the reliability and consistency of our tool, we have created a diverse human-annotated test dataset of approximately 1,000 samples, where the contexts and the labels are all created by humans. Our results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation ability and 88.28% of GPT-4's in terms of F1-score on our test dataset.. More papers and features are coming soon.
H
Data from: ClinicalLab
dataverse.harvard.edu
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weixiang Yan (2025). ClinicalLab [Dataset]. http://doi.org/10.7910/DVN/BFPNTR
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BFPNTR
Dataset updated
May 19, 2025
Dataset provided by
Harvard Dataverse
Authors
Weixiang Yan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.
Mobile On-Device LLM Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Mobile On-Device LLM Market Research Report 2033 [Dataset]. https://dataintelo.com/report/mobile-on-device-llm-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Mobile On-Device LLM Market Outlook

According to our latest research, the global Mobile On-Device LLM market size reached USD 1.62 billion in 2024, demonstrating robust momentum driven by surging demand for privacy-centric and real-time AI applications. The market is projected to expand at a CAGR of 29.4% during the forecast period, with the total market size anticipated to reach USD 14.13 billion by 2033. This remarkable growth trajectory is primarily attributed to the rapid proliferation of AI-powered mobile devices, increasing user awareness regarding data privacy, and continuous advancements in edge computing and model optimization techniques.

One of the primary growth factors catalyzing the Mobile On-Device LLM market is the escalating demand for AI-driven functionalities that do not compromise user privacy. As consumers and enterprises become more vigilant about data breaches and regulatory compliance, on-device large language models (LLMs) offer a compelling solution by processing sensitive data locally rather than transmitting it to external servers. This capability not only minimizes latency and enhances user experience but also aligns with global data protection mandates such as GDPR and CCPA. Furthermore, the integration of LLMs directly into mobile hardware is enabling a new generation of smart applications—from personalized virtual assistants to advanced text generation—fueling widespread adoption across both consumer and enterprise segments.

Technological advancements in model compression, quantization, and hardware acceleration are also pivotal in driving the market forward. The evolution of small and medium-sized LLMs, tailored for resource-constrained environments like smartphones and wearables, has dramatically improved inference efficiency without sacrificing performance. Leading semiconductor manufacturers are embedding AI accelerators within chipsets, empowering devices to handle complex natural language processing (NLP) tasks in real time. This synergy between hardware and software is reducing power consumption, extending battery life, and making sophisticated AI capabilities accessible even on mid-tier devices. As a result, the addressable market for on-device LLMs is rapidly expanding beyond flagship smartphones to encompass tablets, wearables, and a diverse array of IoT endpoints.

Another significant growth driver is the surge in demand for hyper-personalized experiences across applications such as content recommendation, predictive text, translation, and contextual search. On-device LLMs enable seamless, always-available AI services that adapt to individual user preferences without persistent internet connectivity. This is particularly valuable in regions with unreliable network infrastructure or stringent data localization requirements. Additionally, enterprises are leveraging on-device AI to enhance productivity, automate workflows, and strengthen endpoint security, further accelerating market penetration. As organizations across healthcare, education, and retail sectors invest in digital transformation, the scope for on-device LLM deployment is set to broaden considerably over the coming years.

From a regional perspective, the Asia Pacific region is emerging as a dominant force in the Mobile On-Device LLM market, driven by rapid smartphone adoption, burgeoning digital ecosystems, and a thriving manufacturing base for consumer electronics. North America and Europe are also witnessing strong uptake, propelled by high consumer spending, robust enterprise digitalization, and a favorable regulatory environment for AI innovation. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by increasing investments in mobile infrastructure and growing awareness of the benefits of on-device AI. The interplay of these regional trends is shaping a highly dynamic and competitive global market landscape.

Model Type Analysis

The Mobile On-Device LLM market is segmented by model type into Small Language Models, Medium Language Models, and Large Language Models, each catering to distinct device capabilities and application requirements. Small Language Models (SLMs), typically comprising fewer than 1 billion parameters, are engineered for ultra-low latency and minimal resource consumption, making them ideal for wearables, entry-level smartphones, and IoT devices. Their compact size enables efficient operation even on devices with limited memory and processing power, while s
h
AI_Agent_Evasion_Dataset
huggingface.co
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunny thakur (2025). AI_Agent_Evasion_Dataset [Dataset]. https://huggingface.co/datasets/darkknight25/AI_Agent_Evasion_Dataset
Explore at:
Dataset updated
May 21, 2025
Authors
Sunny thakur
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AI Agent Evasion Dataset Overview The AI Agent Evasion Dataset is a comprehensive collection of 1000 prompts designed to train and evaluate large language models (LLMs) against advanced attacks targeting AI-driven systems, such as chatbots, APIs, and voice assistants. It addresses vulnerabilities outlined in the OWASP LLM Top 10, including prompt injection, data leakage, and unauthorized command execution. The dataset balances 70% malicious prompts (700 entries) with 30% benign prompts (300… See the full description on the dataset page: https://huggingface.co/datasets/darkknight25/AI_Agent_Evasion_Dataset.
Local LLM Inference Board (Robot) Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Local LLM Inference Board (Robot) Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/local-llm-inference-board-robot-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jun 29, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Local LLM Inference Board (Robot) Market Outlook

According to our latest research, the global Local LLM Inference Board (Robot) market size reached USD 2.18 billion in 2024, reflecting robust industry momentum. The market is projected to expand at a CAGR of 19.7% from 2025 to 2033, reaching a forecasted value of USD 10.65 billion by 2033. This remarkable growth is driven by advancements in edge AI hardware, the proliferation of intelligent robotics across industries, and the increasing demand for real-time, on-device large language model (LLM) inference. The market’s upward trajectory is further fueled by the convergence of artificial intelligence, robotics, and next-generation computing platforms, creating substantial opportunities for both established players and innovative startups.

The accelerating integration of artificial intelligence into robotics is a primary growth factor for the Local LLM Inference Board (Robot) market. As industries demand higher autonomy, responsiveness, and intelligence from robotic systems, the need for robust, low-latency, and power-efficient LLM inference at the edge is intensifying. Local LLM inference boards empower robots to process and understand complex language-based tasks in real time, without relying on cloud connectivity. This capability is crucial for applications such as collaborative industrial robots, healthcare assistants, and autonomous vehicles, where latency, privacy, and reliability are paramount. The rapid evolution of transformer models and efficient AI chipsets has made it feasible to deploy sophisticated LLMs directly on robotic hardware, further accelerating market adoption.

Another significant driver is the growing emphasis on data privacy, security, and compliance across regulated sectors such as healthcare, manufacturing, and automotive. Local LLM inference boards enable organizations to process sensitive data on-premises or at the edge, minimizing the risk of data breaches and ensuring compliance with stringent data protection regulations. This is particularly critical in healthcare robotics, where patient data confidentiality is non-negotiable, and in manufacturing environments, where intellectual property and operational data must remain secure. The ability to deliver advanced AI-powered functionalities without transmitting data to external servers is a compelling value proposition, positioning local inference solutions as a preferred choice for enterprises with strict privacy requirements.

The market’s expansion is also being propelled by advancements in hardware acceleration technologies and the growing ecosystem of software frameworks optimized for on-device LLM deployment. The emergence of specialized AI inference boards, featuring high-performance GPUs, TPUs, and NPUs, has significantly improved the efficiency and scalability of local LLM processing. Additionally, the availability of robust software stacks, model compression techniques, and toolchains designed for edge deployment has lowered the barriers for integrating LLMs into diverse robotic platforms. This synergy between hardware and software innovation is catalyzing the development of next-generation robots capable of natural language understanding, contextual reasoning, and adaptive interaction in dynamic environments.

Regionally, Asia Pacific is emerging as the dominant market for Local LLM Inference Boards, driven by the rapid adoption of robotics in manufacturing, logistics, and consumer electronics. North America and Europe are also witnessing strong growth, fueled by technological innovation, robust R&D investments, and early adoption across healthcare and automotive sectors. The Middle East & Africa and Latin America are gradually catching up, supported by government initiatives and increasing investments in smart automation. The regional landscape is characterized by diverse application scenarios, regulatory frameworks, and ecosystem maturity, shaping the competitive dynamics and growth opportunities for market participants.

Component Analysis

<br
h
XCoder-80K
huggingface.co
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yejie Wang (2024). XCoder-80K [Dataset]. https://huggingface.co/datasets/banksy235/XCoder-80K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2024
Authors
Yejie Wang
Description
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

The performance of large language models on programming tasks is impressive, but many datasets suffer from data leakage, particularly in benchmarks like HumanEval and MBPP. To tackle this, we introduce the XCoder-Complexity-Scorer, which control code instruction-tuning data quality across three key dimensions: instruction complexity, response quality, and diversity. We also traine a Unit Test… See the full description on the dataset page: https://huggingface.co/datasets/banksy235/XCoder-80K.
P
SciEval Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liangtai Sun; Yang Han; Zihan Zhao; Da Ma; Zhennan Shen; Baocai Chen; Lu Chen; Kai Yu, SciEval Dataset [Dataset]. https://paperswithcode.com/dataset/scieval
Explore at:
Authors
Liangtai Sun; Yang Han; Zihan Zhao; Da Ma; Zhennan Shen; Baocai Chen; Lu Chen; Kai Yu
Description
SciEval is a comprehensive and multi-disciplinary evaluation benchmark designed to assess the performance of large language models (LLMs) in the scientific domain. It addresses several critical issues related to evaluating LLMs for scientific research.

Here are the key features of SciEval:

Multi-Dimensional Evaluation: SciEval systematically evaluates scientific research ability across four dimensions based on Bloom's taxonomy. These dimensions cover various aspects of scientific understanding and reasoning.

Objective and Subjective Questions: Unlike existing benchmarks that primarily rely on pre-collected objective questions, SciEval includes both objective and subjective questions. This approach ensures a more comprehensive evaluation of LLMs' abilities.

Dynamic Subset: To prevent potential data leakage, SciEval introduces a "dynamic" subset based on scientific principles. This subset dynamically adapts to evaluate LLMs' performance without compromising the integrity of the evaluation process.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Simon Knuts (2025). JailbreakPrompts [Dataset]. https://huggingface.co/datasets/Simsonsun/JailbreakPrompts

JailbreakPrompts

Jailbreaking prompts

Simsonsun/JailbreakPrompts

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 26, 2025

Authors

Simon Knuts

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Independent Jailbreak Datasets for LLM Guardrail Evaluation

Constructed for the thesis:“Contamination Effects: How Training Data Leakage Affects Red Team Evaluation of LLM Jailbreak Detection” The effectiveness of LLM guardrails is commonly evaluated using open-source red teaming tools. However, this study reveals that significant data contamination exists between the training sets of binary jailbreak classifiers (ProtectAI, Katanemo, TestSavantAI, etc.) and the test prompts used in… See the full description on the dataset page: https://huggingface.co/datasets/Simsonsun/JailbreakPrompts.

Clear search

Close search

Google apps

Main menu

JailbreakPrompts

Supplementary Material for "Investigating Software Development Teams...

PandaLM-testset

Data from: ClinicalLab

Mobile On-Device LLM Market Research Report 2033

Mobile On-Device LLM Market Outlook

Model Type Analysis

AI_Agent_Evasion_Dataset

Local LLM Inference Board (Robot) Market Research Report 2033

Local LLM Inference Board (Robot) Market Outlook

Component Analysis

XCoder-80K

SciEval Dataset

JailbreakPrompts

Jailbreaking prompts

Simsonsun/JailbreakPrompts