59 datasets found

h
do-not-answer-ja
huggingface.co
Updated Sep 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kunishou (2023). do-not-answer-ja [Dataset]. https://huggingface.co/datasets/kunishou/do-not-answer-ja
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2023
Authors
kunishou
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset was created by automatically translating "do-not-answer" into Japanese.This dataset is licensed under CC-BY-NC-SA-4.0
do-not-answer-jahttps://github.com/kunishou/do-not-answer-jado-not-answerhttps://github.com/Libr-AI/do-not-answer
f
Data from: A Toolbox for Surfacing Health Equity Harms and Biases in Large...
springernature.figshare.com
application/csv
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal (2024). A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models [Dataset]. http://doi.org/10.6084/m9.figshare.26133973.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26133973.v1
Dataset updated
Sep 24, 2024
Dataset provided by
figshare
Authors
Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary material and data for Pfohl and Cole-Lewis et al., "A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models" (2024).

We include the sets of adversarial questions for each of the seven EquityMedQA datasets (OMAQ, EHAI, FBRT-Manual, FBRT-LLM, TRINDS, CC-Manual, and CC-LLM), the three other non-EquityMedQA datasets used in this work (HealthSearchQA, Mixed MMQA-OMAQ, and Omiye et al.), as well as the data generated as a part of the empirical study, including the generated model outputs (Med-PaLM 2 [1] primarily, with Med-PaLM [2] answers for pairwise analyses) and ratings from human annotators (physicians, health equity experts, and consumers). See the paper for details on all datasets.

We include other datasets evaluated in this work: HealthSearchQA [2], Mixed MMQA-OMAQ, and Omiye et al [3].

Mixed MMQA-OMAQ is composed of the 140 question subset of MultiMedQA questions described in [1,2] with an additional 100 questions from OMAQ (described below). The 140 MultiMedQA questions are composed of 100 from HealthSearchQA, 20 from LiveQA [4], and 20 from MedicationQA [5]. In the data presented here, we do not reproduce the text of the questions from LiveQA and MedicationQA. For LiveQA, we instead use identifier that correspond to those presented in the original dataset. For MedicationQA, we designate "MedicationQA_N" to refer to the N-th row of MedicationQA (0-indexed).

A limited number of data elements described in the paper are not included here. The following elements are excluded:

The reference answers written by physicians to HealthSearchQA questions, introduced in [2], and the set of corresponding pairwise ratings. This accounts for 2,122 rated instances.

The free-text comments written by raters during the ratings process.

Demographic information associated with the consumer raters (only age group information is included).

References

Singhal, K., et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).

Singhal, K., Azizi, S., Tu, T. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). https://doi.org/10.1038/s41586-023-06291-2

Omiye, J.A., Lester, J.C., Spichak, S. et al. Large language models propagate race-based medicine. npj Digit. Med. 6, 195 (2023). https://doi.org/10.1038/s41746-023-00939-z

Abacha, Asma Ben, et al. "Overview of the medical question answering task at TREC 2017 LiveQA." TREC. 2017.

Abacha, Asma Ben, et al. "Bridging the gap between consumers’ medication questions and trusted answers." MEDINFO 2019: Health and Wellbeing e-Networks for All. IOS Press, 2019. 25-29.

Description of files and sheets

Independent Ratings [ratings_independent.csv]: Contains ratings of the presence of bias and its dimensions in Med-PaLM 2 outputs using the independent assessment rubric for each of the datasets studied. The primary response regarding the presence of bias is encoded in the column bias_presence with three possible values (No bias, Minor bias, Severe bias). Binary assessments of the dimensions of bias are encoded in separate columns (e.g., inaccuracy_for_some_axes). Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Instances were missing for five instances in MMQA-OMAQ and two instances in CC-Manual. This file contains 7,519 rated instances.

Paired Ratings [ratings_pairwise.csv]: Contains comparisons of the presence or degree of bias and its dimensions in Med-PaLM and Med-PaLM 2 outputs for each of the datasets studied. Pairwise responses are encoded in terms of two binary columns corresponding to which of the answers was judged to contain a greater degree of bias (e.g., Med-PaLM-2_answer_more_bias). Dimensions of bias are encoded in the same way as for ratings_independent.csv. Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Four ratings were missing (one for EHAI, two for FRT-Manual, one for FBRT-LLM). This file contains 6,446 rated instances.

Counterfactual Paired Ratings [ratings_counterfactual.csv]: Contains ratings under the counterfactual rubric for pairs of questions defined in the CC-Manual and CC-LLM datasets. Contains a binary assessment of the presence of bias (bias_presence), columns for each dimension of bias, and categorical columns corresponding to other elements of the rubric (ideal_answers_diff, how_answers_diff). Instances for the CC-Manual dataset are triple-rated, instances for CC-LLM are single-rated. Due to a data processing error, we removed questions that refer to `Natal'' from the analysis of the counterfactual rubric on the CC-Manual dataset. This affects three questions (corresponding to 21 pairs) derived from one seed question based on the TRINDS dataset. This file contains 1,012 rated instances.

Open-ended Medical Adversarial Queries (OMAQ) [equitymedqa_omaq.csv]: Contains questions that compose the OMAQ dataset. The OMAQ dataset was first described in [1].

Equity in Health AI (EHAI) [equitymedqa_ehai.csv]: Contains questions that compose the EHAI dataset.

Failure-Based Red Teaming - Manual (FBRT-Manual) [equitymedqa_fbrt_manual.csv]: Contains questions that compose the FBRT-Manual dataset.

Failure-Based Red Teaming - LLM (FBRT-LLM); full [equitymedqa_fbrt_llm.csv]: Contains questions that compose the extended FBRT-LLM dataset.

Failure-Based Red Teaming - LLM (FBRT-LLM) [equitymedqa_fbrt_llm_661_sampled.csv]: Contains questions that compose the sampled FBRT-LLM dataset used in the empirical study.

TRopical and INfectious DiseaseS (TRINDS) [equitymedqa_trinds.csv]: Contains questions that compose the TRINDS dataset.

Counterfactual Context - Manual (CC-Manual) [equitymedqa_cc_manual.csv]: Contains pairs of questions that compose the CC-Manual dataset.

Counterfactual Context - LLM (CC-LLM) [equitymedqa_cc_llm.csv]: Contains pairs of questions that compose the CC-LLM dataset.

HealthSearchQA [other_datasets_healthsearchqa.csv]: Contains questions sampled from the HealthSearchQA dataset [1,2].

Mixed MMQA-OMAQ [other_datasets_mixed_mmqa_omaq]: Contains questions that compose the Mixed MMQA-OMAQ dataset.

Omiye et al. [other datasets_omiye_et_al]: Contains questions proposed in Omiye et al. [3].

Version history

Version 2: Updated to include ratings and generated model outputs. Dataset files were updated to include unique ids associated with each question. Version 1: Contained datasets of questions without ratings. Consistent with v1 available as a preprint on Arxiv (https://arxiv.org/abs/2403.12025)

WARNING: These datasets contain adversarial questions designed specifically to probe biases in AI systems. They can include human-written and model-generated language and content that may be inaccurate, misleading, biased, disturbing, sensitive, or offensive.

NOTE: the content of this research repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.
h
safety-refusals
huggingface.co
Updated Oct 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcel (2025). safety-refusals [Dataset]. https://huggingface.co/datasets/mvrcii/safety-refusals
Explore at:
Dataset updated
Oct 21, 2025
Authors
Marcel
Description
Safety Refusals Dataset

Overview

17,450 safe refusal responses from LLMs, combining two safety evaluation benchmarks. All samples demonstrate appropriate refusals to harmful prompts.

Sources

Do-Not-Answer (5,450): Responses from GPT-4, ChatGPT, Claude, ChatGLM2, LLaMA-2-7b, Vicuna-7b with action classes 0-4 Data Advisor (12,000): Safety-aligned refusals from fwnlp/data-advisor-safety-alignment

All samples classified into 10 safety topics using… See the full description on the dataset page: https://huggingface.co/datasets/mvrcii/safety-refusals.
Complex Sequential Question Answering dataset
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrita Saha; Vardaan Pahuja; Vardaan Pahuja; Mitesh Khapra; Karthik Sankaranarayanan; Sarath Chandar; Amrita Saha; Mitesh Khapra; Karthik Sankaranarayanan; Sarath Chandar (2020). Complex Sequential Question Answering dataset [Dataset]. http://doi.org/10.5281/zenodo.3268649
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3268649
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Amrita Saha; Vardaan Pahuja; Vardaan Pahuja; Mitesh Khapra; Karthik Sankaranarayanan; Sarath Chandar; Amrita Saha; Mitesh Khapra; Karthik Sankaranarayanan; Sarath Chandar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.

Please visit https://amritasaha1812.github.io/CSQA/ for more details.
g
User assessment employment SoL – Staff do not care about the user,...
gimi9.com
Updated Dec 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). User assessment employment SoL – Staff do not care about the user, percentage (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-api-kolada-se-v2-kpi-u26459
Explore at:
Dataset updated
Dec 28, 2023
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a development key figure, see questions and answers on kolada.se for more information. Number of people in employment SoL who have answered No to the question Do the staff about you? divided by all the people in employment SoL who have answered the question. The answer options were Yes, Sometimes, No. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
d
HSIP E911 Public Safety Answering Point (PSAP)
catalog.data.gov
datasets.ai
+2more
Updated Dec 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2020). HSIP E911 Public Safety Answering Point (PSAP) [Dataset]. https://catalog.data.gov/dataset/hsip-e911-public-safety-answering-point-psap
Explore at:
Dataset updated
Dec 2, 2020
Dataset provided by
(Point of Contact)
Description
911 Public Safety Answering Point (PSAP) service area boundaries in New Mexico According to the National Emergency Number Association (NENA), a Public Safety Answering Point (PSAP) is a facility equipped and staffed to receive 9-1-1 calls. The service area is the geographic area within which a 911 call placed using a landline is answered at the associated PSAP. This dataset only includes primary PSAPs. Secondary PSAPs, backup PSAPs, and wireless PSAPs have been excluded from this dataset. Primary PSAPs receive calls directly, whereas secondary PSAPs receive calls that have been transferred by a primary PSAP. Backup PSAPs provide service in cases where another PSAP is inoperable. Most military bases have their own emergency telephone systems. To connect to such system from within a military base it may be necessary to dial a number other than 9 1 1. Due to the sensitive nature of military installations, TGS did not actively research these systems. If civilian authorities in surrounding areas volunteered information about these systems or if adding a military PSAP was necessary to fill a hole in civilian provided data, TGS included it in this dataset. Otherwise military installations are depicted as being covered by one or more adjoining civilian emergency telephone systems. In some cases areas are covered by more than one PSAP boundary. In these cases, any of the applicable PSAPs may take a 911 call. Where a specific call is routed may depend on how busy the applicable PSAPS are (i.e. load balancing), operational status (i.e. redundancy), or time of date / day of week. If an area does not have 911 service, TGS included that area in the dataset along with the address and phone number of their dispatch center. These are areas where someone must dial a 7 or 10 digit number to get emergency services. These records can be identified by a "Y" in the [NON911EMNO] field. This indicates that dialing 911 inside one of these areas does not connect one with emergency services. This dataset was constructed by gathering information about PSAPs from state level officials. In some cases this was geospatial information, in others it was tabular. This information was supplemented with a list of PSAPs from the Federal Communications Commission (FCC). Each PSAP was researched to verify its tabular information. In cases where the source data was not geospatial, each PSAP was researched to determine its service area in terms of existing boundaries (e.g. city and county boundaries). In some cases existing boundaries had to be modified to reflect coverage areas (e.g. "entire county north of Country Road 30"). However, there may be cases where minor deviations from existing boundaries are not reflected in this dataset, such as the case where a particular PSAPs coverage area includes an entire county, and the homes and businesses along a road which is partly in another county. Text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. All diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics.
f
Consistency of logic.
plos.figshare.com
xls
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyuhyung Kim; Sae Byeol Mun; Young Jae Kim; Bong Chul Kim; Kwang Gi Kim (2025). Consistency of logic. [Dataset]. http://doi.org/10.1371/journal.pone.0322529.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0322529.t004
Dataset updated
May 28, 2025
Dataset provided by
PLOS ONE
Authors
Kyuhyung Kim; Sae Byeol Mun; Young Jae Kim; Bong Chul Kim; Kwang Gi Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionIn this study, we aim to evaluate the ability of large language models (LLM) to generate questions and answers in oral and maxillofacial surgery.MethodsChatGPT4, ChatGPT4o, and Claude3-Opus were evaluated in this study. Each LLM was instructed to generate 50 questions about oral and maxillofacial surgery. Three LLMs were asked to answer the generated 150 questions.ResultsAll 150 questions generated by the three LLMs were related to oral and maxillofacial surgery. Each model exhibited a correct answer rate of over 90%. None of the three models were able to answer correctly all the questions they generated themselves. The correct answer rate was 97.0% for questions with figures, significantly higher than the 88.9% rate for questions without figures. The analysis of problem-solving by the three LLMs showed that each model generally inferred answers with high accuracy, and there were few logical errors that could be considered controversial. Additionally, all three scored above 88% for the fidelity of their explanations.ConclusionThis study demonstrates that while LLMs like ChatGPT4, ChatGPT4o, and Claude3-Opus exhibit robust capabilities in generating and solving oral and maxillofacial surgery questions, their performance is not without limitations. None of the models were able to answer correctly all the questions they generated themselves, highlighting persistent challenges such as AI hallucinations and contextual understanding gaps. The results also emphasize the importance of multimodal inputs, as questions with annotated images achieved higher accuracy rates compared to text-only prompts. Despite these shortcomings, the LLMs showed significant promise in problem-solving, logical consistency, and response fidelity, particularly in structured or numerical contexts.
f
350M Model
figshare.com
json
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pavel Chizhov (2025). 350M Model [Dataset]. http://doi.org/10.6084/m9.figshare.29135096.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29135096.v1
Dataset updated
May 23, 2025
Dataset provided by
figshare
Authors
Pavel Chizhov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
350M Model**RAG-350M** is a 350 million parameters Small Reasoning Model, trained for retrieval-augmented general (RAG), search and source summarization. Along with RAG-1B it belongs to our family of specialized reasoning models.RAG-350M outperforms most SLMs (4 billion parameters and below) on standardized benchmarks for retrieval-augmented general (HotPotQA, 2wiki) and is a highly cost-effective alternative with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B and Gemma-3-4B. It is the only SLM to date to maintain consistent RAG performance across leading European languages and to ensure systematic reference grounding for statements. Due to its size, ease of deployment on constrained infrastructure (including mobile phone) and built-in support for factual and accurate information, RAG-350m unlocks a range of new use cases for generative AI.## FeaturesRAG-350M is a specialized language model using a series of special tokens to process a structured input (query and sources) and generate a structured output (reasoning sequence and answer with sources). For easier implementation, we encourage to use the associated API library.### Citation supportRAG-350M natively generated grounded answers on the basis of excerpts and citations extracted from the provided sources, using a custom syntax inspired by Wikipedia. It is one a handful open weights model to date to have been developed with this feature and the first one designed for actual deployment. In contrast with Anthropic approach (Citation mode), citation are integrally generated by the model and are not the product of external chunking. As a result we can provide another desirable feature to simplify source checking: citation shortening for longer excerpts (using "(…)").### RAG reasoningRAG-350M generates a specific reasoning sequences incorporating several proto-agentic abilities for RAG applications. The model is able to make a series of decisions directly:* Assessing whether the query is understandable.* Assessing whether the query is trivial enough to not require a lengthy pre-analysis (adjustable reasoning)* Assessing whether the sources do contain enough input to generate a grounded answer.The structured reasoning trace include the following steps:* Language detection of the query. The model will always strive to answer in the language of the original query.* Query analysis and associated query report. The analysis can either lead to a standard answer, a shortening reasoning trace/answer for trivial question, a reformulated query or a refusal (that could in the context of the application be transformed into user input querying).* Source analysis and associated source report. This step evaluates the coverage and depth of the provided sources in regards to the query.* Draft of the final answer.### MultilingualityRAG-350M is able to read and write in the main European languages: French, German, Italian, Spanish and, to a lesser extent, Polish, Latin and Portuguese.To date, it is the only small language model with negligible loss of performance in leading European languages for RAG-related tasks. On a translated set of HotPotQA we observed a significant drop of performance in most SLMs from 10\% to 30-35\% for sub-1B models. We do expect the results of any standard English evaluation on our RAG models should be largely transferable to the main European languages limiting the costs of evaluation and deployment in multilingual settings.## TrainingRAG-350M is trained on large synthetic dataset emulating retrieval of wide variety of multilingual open sources from Common Corpus. They provide native support for citation and grounding with literal quotes. Following on the latest trends of agentification, the models reintegrate multiple features associated with RAG workflows such as query routing, query reformulation, source reranking.## EvaluationRAG-350M was evaluated on three standard RAG benchmarks, 2wiki, HotpotQA and MuSique.All the benchmarks only assess the "trivial" mode on questions requiring some form of multi-hop reasoning over sources (answer disseminated into different sources) as well as discrimination of distractor sources.RAG-350M is not simply a cost-effective version of larger models. We found it has been able to answer correctly to several hundred questions from HotPotQA that neither Llama-3-8b nor Qwen-2.5-7b could solve. Consequently we encourage its use as part of multi-model RAG systems.
Handwritten ASAP Short Answer Scoring
zenodo.org
zip
Updated Nov 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Gold; Torsten Zesch; Christian Gold; Torsten Zesch (2023). Handwritten ASAP Short Answer Scoring [Dataset]. http://doi.org/10.5281/zenodo.8088866
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8088866
Dataset updated
Nov 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christian Gold; Torsten Zesch; Christian Gold; Torsten Zesch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is based on the Short Answer Scoring (SAS) dataset of the Automated Student Assessment Prize (ASAP).
Although the original dataset was conducted on handwritten content, the scans are not available.
To analyze the full pipeline from Handwritten Answers to Automated Scoring, we let students rewrite some answers.
The texts used from SAS are from the test set and from the training set.
f
QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph
figshare.com
json
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov (2024). QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph [Dataset]. http://doi.org/10.6084/m9.figshare.25256290.v3
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25256290.v3
Dataset updated
Feb 21, 2024
Dataset provided by
figshare
Authors
Mona Zamiri; Yao Qiang; Fedor Nikolaev; Dongxiao Zhu; Alexander Kotov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QBLink-KG is a modified version of QBLink, which is a high-quality benchmark for evaluating conversational understanding of Wikipedia content.QBLink consists of sequences of up to three hand-crafted queries, with responses being single-named entities that match the titles of Wikipedia articles.For the QBLink-KG, the English subset of the DBpedia snapshot from September 2021 was used as the target Knowledge Graph. QBLink answers provided as the titles of Wikipedia infoboxes can be easily mapped to DBpedia entity URIs - if the corresponding entities are present in DBpedia - since DBpedia is constructed through the extraction of information from Wikipedia infoboxes.QBLink, in its original format, is not directly applicable for Conversational Entity Retrieval from a Knowledge Graph (CER-KG) because knowledge graphs contain considerably less information than Wikipedia. A named entity serving as an answer to a QBLink query may not be present as an entity in DBpedia. To modify QBLink for CER over DBpedia, we implemented two filtering steps: 1) we removed all queries for which the wiki_page field is empty, or the answer cannot be mapped to a DBpedia entity or does not match to a Wikipedia page. 2) For the evaluation of a model with specific techniques for entity linking and candidate selection, we excluded queries with answers that do not belong to the set of candidate entities derived using that model.The original QBLink dataset files before filtering are:QBLink-train.jsonQBLink-dev.jsonQBLink-test.jsonAnd the final QBLink-KG files after filtering are:QBLink-Filtered-train.jsonQBLink-Filtered-dev.jsonQBLink-Filtered-test.jsonWe used below references to construct QBLink-KG:Ahmed Elgohary, Chen Zhao, and Jordan Boyd-Graber. 2018. A dataset and baselines for sequential open-domain question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1077–1083, Brussels, Belgium. Association for Computational Linguistics.https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09Lehmann, Jens et al. ‘DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia’. 1 Jan. 2015 : 167 – 195.To give more details about QBLink-KG, please read our research paper:Zamiri, Mona, et al. "Benchmark and Neural Architecture for Conversational Entity Retrieval from a Knowledge Graph", The Web Conference 2024.
g
User assessment daily activities LSS – The user does not feel safe with...
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
User assessment daily activities LSS – The user does not feel safe with anyone in staff, percentage (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-api-kolada-se-v2-kpi-u28611/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a development key figure, see questions and answers on kolada.se for more information. Number of people in daily operations LSS who have answered None to the question Do you feel safe with the staff? divided by all people in daily activities LSS who have answered the question. The answer options were Everyone, Some, None. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
gpt-oss20b-tool-choice-20250810
kaggle.com
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Horvath Katalin (2025). gpt-oss20b-tool-choice-20250810 [Dataset]. https://www.kaggle.com/datasets/horvathkatalin/tool-choice-finding-20250810-zipp/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 14, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Horvath Katalin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
gpt-oss20b — Tool Choice (2025-08-10)

When to emit a tool call vs answer directly under Harmony + function-style JSON contracts

Summary

Schema-aligned finding for openai/gpt-oss-20b. We explore tool‑selection behavior in a simulated function‑calling setting: the model must either emit a tool call (strict JSON with tool_name + arguments) or answer directly in JSON, depending on the prompt. We quantify false positives (unnecessary tool calls), false negatives (should call but doesn’t), and schema validity.

Contents

tool_choice.findings.json — schema‑aligned finding (walkthrough + steps_to_reproduce).

README.md (optional) — overview and reproduction notes.

LICENSE — CC0 1.0.

(Optional, recommended)

metrics.csv — (tool_call_selected, correct_tool, false_positive_tool, false_negative_tool, tool_schema_valid, strict_json_ok).

schema_tool.json — minimal JSON Schema for the tool‑call object.

raw_outputs.jsonl / sanitized_outputs.jsonl.

Reproduction

Format: Harmony conversation with clear decision rule: “If the query is X → emit tool JSON; else → direct answer JSON.”
Deterministic parameters: temperature=0.0, do_sample=False, max_new_tokens ≤ 64, top_p=1.0, seed=20250810.
Procedure: 1. Run prompt families that disambiguate tool‑needed vs no‑tool cases. 2. Check decision correctness and schema validity for tool JSON. 3. Validate Strict‑JSON for the direct‑answer branch (no trailing prose).

Determinism: Deterministic under stated parameters; log versions/seeds.

Why it matters

Production agents rely on reliable tool selection. Spurious tool calls slow systems and break UX; missing calls block tasks. A simple, schema‑checked protocol surfaces decision quality.

Signals we track (risk indicators)

tool_call_selected (boolean) and correct_tool (boolean).

false_positive_tool / false_negative_tool.

tool_schema_valid (JSON Schema).

strict_json_ok for direct answers.

post‑answer prose or extra meta‑tokens.

Scope & limits

Simulated tool calls via JSON; no real tool execution.

Benign tasks; no harmful content.

Illustrative and not exhaustive of all tool ecosystems.

Environment & determinism notes

Model: openai/gpt-oss-20b

Chat format: Harmony

Decoding: temperature=0.0, do_sample=False, max_new_tokens≤64, top_p=1.0

Seed: 20250810

Recommended mitigations (developer playbook)

Clear routing rule in the prompt + explicit JSON contracts per branch.

Validator‑first post‑processing (schema check) → reject/repair.

Penalize tool false positives via re‑ask or rule reminder.

Log a confusion matrix over time for regressions.

Competition compliance & visibility

License: CC0

Keep Private during the competition; follow event policy at end.

Metadata

Team: horvathkatalin

Issue key: tool_choice

Finding ID: tool_choice-20250810

Model: openai/gpt-oss-20b

Seed: 20250810

Temporal coverage: 2025‑08‑10 → 2025‑08‑12

Expected update frequency: Never

Provenance

Original work by “horvathkatalin”; schema/layout per public findings example.

License

CC0 1.0 Universal.

How to use (Notebook quick start)

Add Dataset as input; load tool_choice.findings.json.

Run deterministic decoding on tool vs no‑tool cases.

Validate the emitted JSON (tool schema vs direct answer schema); compute error rates.

Changelog

v1 (2025‑08‑10): Initial release (schema‑aligned).

v2 (2025‑08‑12): Added decision metrics + mitigation notes.

Acknowledgments. We thank OpenAI’s GPT‑5 Pro (ChatGPT) for copy‑editing support and non‑executable checklisting during write‑up. All artifacts and conclusions are the authors’ responsibility and do not represent the views of OpenAI.
Z
Quora Chat Dutch
data.niaid.nih.gov
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanroy, Bram (2023). Quora Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052391
Explore at:
Dataset updated
Jun 20, 2023
Authors
Vanroy, Bram
License
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Description
This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.

Data Instances

{ 'id': 10, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update ' 'zal krijgen? ' '[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de ' 'Android 7.0 (Nougat) update. ' '[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal ' 'zijn? ' '[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 ' 'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de ' 'update zal ontvangen. ' '[|Human|] Is er nieuws of het binnenkort de update zal ontvangen? ' '[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. ' 'Het is het beste om uit te kijken naar updates van Motorola. ' '[|Human|] ', 'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 ' '(Nougat) update?' }

Data Fields

id: the ID of the item. The following IDs are not included because they could not be translated: [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]

input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

topic: the topic description

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is:

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

"""

The prompt to translate the topic is:

TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
" summarizing manner. Translate accurately and formally. Do not provide any explanation"
" about the translation and do not include the original title.

"

The system message was:

You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740].

The translation quality has not been verified. Use at your own risk!

Licensing Information

Licensing info for quora is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.
g
Data from: How Does Not Responding to Appetitive Stimuli Cause Devaluation:...
datasearch.gesis.org
ssh.datastations.nl
Updated Jan 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen, MSc Z. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/401559610; Veling, dr. H.P. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/298978520; Dijksterhuis, prof. dr. A.J. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/153149493; Holland, prof. dr. R.W. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/202163954 (2020). How Does Not Responding to Appetitive Stimuli Cause Devaluation: Evaluative Conditioning or Response Inhibition? [Dataset]. http://doi.org/10.17026/dans-xfv-at9w
Explore at:
Unique identifier
https://doi.org/10.17026/dans-xfv-at9w
Dataset updated
Jan 23, 2020
Dataset provided by
DANS (Data Archiving and Networked Services)
Authors
Chen, MSc Z. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/401559610; Veling, dr. H.P. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/298978520; Dijksterhuis, prof. dr. A.J. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/153149493; Holland, prof. dr. R.W. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/202163954
Description
In a series of 6 experiments, we examined how not responding to appetitive stimuli causes devaluation. To examine this question, a go/no-go task was employed in which appetitive stimuli were consistently associated with cues to respond (go stimuli), or with cues to not respond (either no-go cues or the absence of cues; no-go stimuli). Change in evaluation of go and no-go stimuli was compared to stimuli not presented in the task (untrained stimuli). Overall, the results suggest that devaluation of appetitive stimuli by not responding to them is the result of response inhibition. In these experiments we recruited a total of 272 participants via the Sona participation system at Radboud University. The experiments were conducted from November 2014 to September 2015. The dataset contains all the measurements from these 6 experiments. The analyses were conducted with SPSS 23.
S
Commercial District Customer Intercept Survey Responses - Historical
splitgraph.com
data.cambridgema.gov
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Community Development Department (2024). Commercial District Customer Intercept Survey Responses - Historical [Dataset]. https://www.splitgraph.com/cambridgema-gov/commercial-district-customer-intercept-survey-ibuz-brbz
Explore at:
application/openapi+json, application/vnd.splitgraph.image, jsonAvailable download formats
Dataset updated
May 6, 2024
Dataset authored and provided by
Community Development Department
Description
Every year, the City of Cambridge’s Economic Development Division conducts a customer intercept study at a different commercial district. The results of this study are provided in this dataset to facilitate year-over-year and square-by-square analysis.

This data has been scrubbed of ZIP codes to prevent any respondents from being identified. The customer intercept survey is slightly altered each year for the specific district in consideration, so the exact wording of questions and the questions asked are not necessarily consistent from year to year.

If a question was not asked in a given district’s survey, “[Question Not Asked In This District's Survey]” is listed as the value. A blank cell indicates that the respondent did not answer that question, although it was asked.

In many questions, such as mode of transit, participants were asked to give one answer but often filled in more than one. On surveys after 2014, we captured all of their responses, rather than just the one.

All of the final reports developed from this data are available here: http://www.cambridgema.gov/CDD/econdev/aboutcambridge

Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

See the Splitgraph documentation for more information.
H
Replication Data for: Media Choice and Response Patterns to Questions About...
dataverse.harvard.edu
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taka-aki Asano (2024). Replication Data for: Media Choice and Response Patterns to Questions About Political Knowledge [Dataset]. http://doi.org/10.7910/DVN/ODKTGP
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ODKTGP
Dataset updated
Feb 12, 2024
Dataset provided by
Harvard Dataverse
Authors
Taka-aki Asano
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Previous studies have examined not only whether people answer correctly to questions about political knowledge, but also whether “don’t know” (DK) is a response that implies low self-confidence in their knowledge. However, while it has become clear that different people have accurate knowledge in different topics, there has been little discussion of whether knowledge on specific facts promotes self-confidence. Furthermore, it is believed that people learn about politics through news media, but it is unclear whether the type of media consumption correlates with patterns of correct, incorrect, or DK responses on political knowledge questions. We conducted a survey that asks about knowledge on a variety of topics, such as the political system and current affairs, to (1) categorize patterns of correct, incorrect, and DK answers, and (2) examine their relationship to media choices. We find that those who read newspapers or online news are more knowledgeable about the political system and tend not to choose DK even on questions about knowledge they do not know accurately. In contrast, those who gain knowledge of current issues from watching TV have less confidence in their knowledge and answer DK to questions for which they do not know the correct answer.
C
Chile Educational Attainment: Female: Do Not Know Answer
ceicdata.com
Updated Jun 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2017). Chile Educational Attainment: Female: Do Not Know Answer [Dataset]. https://www.ceicdata.com/en/chile/national-socioeconomic-characterization-survey-education/educational-attainment-female-do-not-know-answer
Explore at:
Dataset updated
Jun 7, 2017
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 1992 - Dec 1, 2017
Area covered
Chile
Description
Chile Educational Attainment: Female: Do Not Know Answer data was reported at 43,704.000 Person in 2017. This records an increase from the previous number of 14,579.000 Person for 2015. Chile Educational Attainment: Female: Do Not Know Answer data is updated yearly, averaging 34,190.000 Person from Dec 1990 (Median) to 2017, with 13 observations. The data reached an all-time high of 68,906.000 Person in 1990 and a record low of 0.000 Person in 2011. Chile Educational Attainment: Female: Do Not Know Answer data remains active status in CEIC and is reported by Ministry of Social Development. The data is categorized under Global Database’s Chile – Table CL.H024: National Socio-Economic Characterization Survey: Education.
Field Office Telephone Service - Monthly National Answer Rate and Busy Rate
catalog.data.gov
Updated Jan 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Field Office Telephone Service - Monthly National Answer Rate and Busy Rate [Dataset]. https://catalog.data.gov/dataset/field-office-telephone-service-monthly-national-answer-rate-and-busy-rate
Explore at:
Dataset updated
Jan 24, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
This dataset provides information at the national level by month for federal fiscal years 2013 on for answer rate and busy rate for calls to our field offices. The data does not include calls that were abandoned while waiting in queue without representative service or overflow calls, which are those callers who wait in queue over the 15-minute maximum wait time without representative service.
g
User assessment Personal assistance – The user does not receive the help...
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
User assessment Personal assistance – The user does not receive the help he/she wants from the assistants, percentage (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-api-kolada-se-v2-kpi-u28621
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance answering No to the question Do you get the help you want from your assistants? divided by all people with personal assistance who have answered the question. The answer options were Yes, Sometimes, No. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
D
Context-Sensitive Solutions For Safety Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Context-Sensitive Solutions For Safety Market Research Report 2033 [Dataset]. https://dataintelo.com/report/context-sensitive-solutions-for-safety-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Context-Sensitive Solutions for Safety Market Outlook

According to our latest research, the global Context-Sensitive Solutions for Safety market size is valued at USD 7.1 billion in 2024, with a robust compound annual growth rate (CAGR) of 8.6%. The market is forecasted to reach USD 14.7 billion by 2033, propelled by increased investments in infrastructure modernization, rising urbanization, and a growing emphasis on public safety. The adoption of advanced engineering and technology-driven solutions, combined with supportive government policies, is driving the expansion of context-sensitive safety strategies across diverse sectors. As per the latest research, the market’s notable growth is underpinned by the urgent need to address evolving safety challenges in transportation and urban environments.

One of the primary growth factors for the Context-Sensitive Solutions for Safety market is the increasing complexity of urban infrastructure and transportation networks. With rapid urbanization, cities are witnessing a surge in population, leading to higher traffic density and more complex interactions between vehicles, pedestrians, and cyclists. This necessitates the deployment of context-sensitive solutions that can adapt to varying environmental and human factors to enhance safety outcomes. Engineering innovations such as adaptive traffic signals, intelligent crosswalks, and dynamic signage are being integrated into city planning, ensuring that safety measures are not only standardized but also tailored to specific local conditions. Furthermore, the rise of smart city initiatives globally is accelerating the uptake of these solutions, as municipalities seek to leverage technology and data analytics to mitigate risks and reduce accident rates.

Another significant driver is the increasing regulatory focus on safety standards and compliance. Governments and transportation authorities are implementing stringent safety regulations, compelling stakeholders to adopt context-sensitive approaches in both new infrastructure projects and the retrofitting of existing assets. Policy frameworks are evolving to prioritize safety not just as a compliance requirement but as a core component of sustainable development. This shift is encouraging collaboration between public agencies, urban planners, and private sector partners to design and implement holistic safety strategies. Education and training programs are also being expanded to ensure that all stakeholders, from engineers to end-users, are equipped with the knowledge and skills necessary to support and sustain these safety initiatives.

Technological advancements are further catalyzing market growth. The integration of digital technologies such as the Internet of Things (IoT), artificial intelligence (AI), and data analytics is transforming the way safety solutions are designed and managed. Real-time monitoring systems, predictive analytics, and automated response mechanisms are enabling proactive risk identification and mitigation. This technological evolution is not only enhancing the effectiveness of context-sensitive safety measures but also providing valuable insights for continuous improvement. As a result, stakeholders are increasingly investing in technology-driven solutions, recognizing their potential to deliver measurable safety benefits while optimizing operational efficiency.

From a regional perspective, North America and Europe are leading the adoption of context-sensitive solutions for safety, owing to their advanced infrastructure, high urbanization rates, and proactive regulatory environments. However, the Asia Pacific region is emerging as the fastest-growing market, driven by rapid urban development, rising investments in smart city projects, and increasing public awareness of safety issues. Latin America and the Middle East & Africa are also witnessing steady growth, supported by infrastructure modernization initiatives and international collaborations. The diverse pace of adoption across regions highlights the importance of localized strategies that address unique socio-economic and environmental challenges.

Solution Type Analysis

The Context-Sensitive Solutions for Safety market by solution type is broadly segmented into engineering solutions, policy & planning, education & training, technology integration, and others. Engineering solutions form the backbone of this market, encompassing infrastructural modifications such as road redesigns, improved lighting, and e

Facebook

Twitter

Click to copy link

Link copied

Cite

kunishou (2023). do-not-answer-ja [Dataset]. https://huggingface.co/datasets/kunishou/do-not-answer-ja

do-not-answer-ja

kunishou/do-not-answer-ja

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 9, 2023

Authors

kunishou

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

This dataset was created by automatically translating "do-not-answer" into Japanese.This dataset is licensed under CC-BY-NC-SA-4.0
do-not-answer-jahttps://github.com/kunishou/do-not-answer-jado-not-answerhttps://github.com/Libr-AI/do-not-answer

Clear search

Close search

Google apps

Main menu

do-not-answer-ja

Data from: A Toolbox for Surfacing Health Equity Harms and Biases in Large...

Supplementary material and data for Pfohl and Cole-Lewis et al., "A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models" (2024).

References

Description of files and sheets

Version history

safety-refusals

Complex Sequential Question Answering dataset

User assessment employment SoL – Staff do not care about the user,...

HSIP E911 Public Safety Answering Point (PSAP)

Consistency of logic.

350M Model

Handwritten ASAP Short Answer Scoring

QBLink-KG: QBLink Adapted to DBpedia Knowledge Graph

User assessment daily activities LSS – The user does not feel safe with...

gpt-oss20b-tool-choice-20250810

gpt-oss20b — Tool Choice (2025-08-10)

Summary

Contents

Reproduction

Why it matters

Signals we track (risk indicators)

Scope & limits

Environment & determinism notes

Recommended mitigations (developer playbook)

Competition compliance & visibility

Metadata

Provenance

License

How to use (Notebook quick start)

Changelog

Quora Chat Dutch

Data from: How Does Not Responding to Appetitive Stimuli Cause Devaluation:...

Commercial District Customer Intercept Survey Responses - Historical

Replication Data for: Media Choice and Response Patterns to Questions About...

Chile Educational Attainment: Female: Do Not Know Answer

Field Office Telephone Service - Monthly National Answer Rate and Busy Rate

User assessment Personal assistance – The user does not receive the help...

Context-Sensitive Solutions For Safety Market Research Report 2033

Context-Sensitive Solutions for Safety Market Outlook

Solution Type Analysis

do-not-answer-ja

kunishou/do-not-answer-ja