The Do-Not-Answer dataset is designed to test the safety performance of Large Language Models (LLMs).
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
kunishowさんが整備してくださってるdo-not-answer-jaからむしろ創作能力という意味では回答してくれた方が有用なカテゴリ5つからおおむね30問ずつ、合わせて150問を抽出したものです。 do-not-answer-ja https://github.com/kunishou/do-not-answer-ja do-not-answer https://github.com/Libr-AI/do-not-answer
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This dataset contains 14,934 instructions, contexts and responses, in several natural language categories such as classification, closed QA, generation, etc. The English original dataset was created by @databricks, who crowd-sourced the data creation via its employees. The current dataset is a translation of that dataset through ChatGPT (gpt-3.5-turbo).
Data Instances
{ "id": 14963, "instruction": "Wat zijn de duurste steden ter wereld?", "context": "", "response": "Dit is een uitgebreide lijst van de duurste steden: Singapore, Tel Aviv, New York, Hong Kong, Los Angeles, Zurich, Genève, San Francisco, Parijs en Sydney.", "category": "brainstorming" }
Data Fields
id: the ID of the item. The following 77 IDs are not included because they could not be translated (or were too long): [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966]
instruction: the instruction (question)
context: additional context that the AI can use to answer the question
response: the AI's expected response
category: the category of this type of question (see Dolly for more info)
Dataset Creation
Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.
The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):
CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional context to the task, and the response to the task, from {src_lang} to {tgt_lang}.
Here are the requirements that you should adhere to:
1. maintain the format: the task consists of a task instruction (marked instruction:
), optional context to the task (marked context:
) and response for the task marked with response:
;
2. do not translate the identifiers instruction:
, context:
, and response:
but instead copy them to your output;
3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
4. translate the instruction and context text using informal, but standard, language;
5. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the context in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang};
7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the context, nor the translation in the response (just copy them as-is);
8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English.
Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else.
"""
The system message was:
You are a helpful assistant that translates English to Dutch according to the requirements that are given to you.
Note that 77 items (0.5%) were not successfully translated. This can either mean that the prompt was too long for the given limit (max_tokens=1024) or that the generated translation could not be parsed into instruction, context and response fields. The missing IDs are [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966].
Initial Data Collection and Normalization
Initial data collection by databricks. See their repository for more information about this dataset.
Considerations for Using the Data
Note that the translations in this new dataset have not been verified by humans! Use at your own risk, both in terms of quality and biases.
Discussion of Biases
As with any machine-generated texts, users should be aware of potential biases that are included in this dataset. Although the prompt specifically includes make sure to avoid biases (such as gender bias, grammatical bias, social bias), of course the impact of such command is not known. It is likely that biases remain in the dataset so use with caution.
Other Known Limitations
The translation quality has not been verified. Use at your own risk!
Licensing Information
This repository follows the original databricks license, which is CC BY-SA 3.0 but see below for a specific restriction.
This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.
If you use this dataset, you must also follow the Sharing and Usage policies.
As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.
This dataset is also available on the Hugging Face hub, its canonical repository.
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.
Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance answering No to the question Do you get the help you want from your assistants? divided by all people with personal assistance who have answered the question. The answer options were Yes, Sometimes, No. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
911 Public Safety Answering Point (PSAP) service area boundaries in New Mexico According to the National Emergency Number Association (NENA), a Public Safety Answering Point (PSAP) is a facility equipped and staffed to receive 9-1-1 calls. The service area is the geographic area within which a 911 call placed using a landline is answered at the associated PSAP. This dataset only includes primary PSAPs. Secondary PSAPs, backup PSAPs, and wireless PSAPs have been excluded from this dataset. Primary PSAPs receive calls directly, whereas secondary PSAPs receive calls that have been transferred by a primary PSAP. Backup PSAPs provide service in cases where another PSAP is inoperable. Most military bases have their own emergency telephone systems. To connect to such system from within a military base it may be necessary to dial a number other than 9 1 1. Due to the sensitive nature of military installations, TGS did not actively research these systems. If civilian authorities in surrounding areas volunteered information about these systems or if adding a military PSAP was necessary to fill a hole in civilian provided data, TGS included it in this dataset. Otherwise military installations are depicted as being covered by one or more adjoining civilian emergency telephone systems. In some cases areas are covered by more than one PSAP boundary. In these cases, any of the applicable PSAPs may take a 911 call. Where a specific call is routed may depend on how busy the applicable PSAPS are (i.e. load balancing), operational status (i.e. redundancy), or time of date / day of week. If an area does not have 911 service, TGS included that area in the dataset along with the address and phone number of their dispatch center. These are areas where someone must dial a 7 or 10 digit number to get emergency services. These records can be identified by a "Y" in the [NON911EMNO] field. This indicates that dialing 911 inside one of these areas does not connect one with emergency services. This dataset was constructed by gathering information about PSAPs from state level officials. In some cases this was geospatial information, in others it was tabular. This information was supplemented with a list of PSAPs from the Federal Communications Commission (FCC). Each PSAP was researched to verify its tabular information. In cases where the source data was not geospatial, each PSAP was researched to determine its service area in terms of existing boundaries (e.g. city and county boundaries). In some cases existing boundaries had to be modified to reflect coverage areas (e.g. "entire county north of Country Road 30"). However, there may be cases where minor deviations from existing boundaries are not reflected in this dataset, such as the case where a particular PSAPs coverage area includes an entire county, and the homes and businesses along a road which is partly in another county. Text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. All diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects and is filtered where the books is The answer to bad religion is not no religion, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Researchers use datasets of Question-Solution pairs to train machine learning models, such as for source code generation. A Question-Solution pair contains two parts: a programming question and its corresponding Solution Snippet. A Solution Snippet is a source code that solves a programming question. These datasets of Question-Solution pairs can be obtained from a number of different platforms. In this study, the information regarding the Question-Solution pairs was specifically obtained from Stack Overflow (SO). However, there are two limitations of datasets of Question-Solution pairs extracted from SO: (1) the Solution Snippets are partially correct and/or the Solution Snippet do not answer the questions, and (2) the information regarding the potential aspects of reusability of Solution Snippets is not available. These limitations can adversely affect the predictability of a machine learning model. Therefore, I conducted an empirical study to categorize various presentations of Solution Snippet in SO answers as well as how Solution Snippets can be adapted for reuse. By doing so, I identified eight categories of how Solution Snippets are presented in SO answers and five categories of how Solution Snippets could be adapted. Based on these results, I concluded several potential reasons why it is not always easy to create datasets of Question-Solution pairs. The first categorization informs that finding the correct location of the Solution Snippet is challenging when several code blocks are within the answer to the question. Subsequently, the researcher must identify which code within that code block is the Solution Snippet. The second categorization informs that most Solution Snippets appear challenging to be adapted for reuse, and how they are potentially adapted is not explicitly stated in them. These insights shed light on how to create better quality datasets in order to improve the predictability of machine learning models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As the obesity rate continues to increase persistently, there is an urgent need to develop an effective weight loss management strategy. Nowadays, the development of artificial intelligence (AI) and cognitive technologies coupled with the rapid spread of messaging platforms and mobile technology with easier access to internet technology offers professional dietitians an opportunity to provide extensive monitoring support to their clients through a chatbot with artificial empathy. This study aimed to design a chatbot with artificial empathic motivational support for weight loss called “SlimMe” and investigate how people react to a diet bot. The SlimMe infrastructure was built using Dialogflow as the natural language processing (NLP) platform and LINE mobile messenger as the messaging platform. We proposed a text-based emotion analysis to simulate artificial empathy responses to recognize the user's emotion. A preliminary evaluation was performed to investigate the early-stage user experience after a 7-day simulation trial. The result revealed that having an artificially empathic diet bot for weight loss management is a fun and exciting experience. The use of emoticons, stickers, and GIF images makes the chatbot response more interactive. Moreover, the motivational support and persuasive messaging features enable the bot to express more empathic and engaging responses to the user. In total, there were 1,007 bot responses from 892 user input messages. Of these, 67.38% (601/1,007) of the chatbot-generated responses were accurate to a relevant user request, 21.19% (189/1,007) inaccurate responses to a relevant request, and 10.31% (92/1,007) accurate responses to an irrelevant request. Only 1.12% (10/1,007) of the chatbot does not answer. We present the design of an artificially empathic diet bot as a friendly assistant to help users estimate their calorie intake and calories burned in a more interactive and engaging way. To our knowledge, this is the first chatbot designed with artificial empathy features, and it looks very promising in promoting long-term weight management. More user interactions and further data training and validation enhancement will improve the bot's in-built knowledge base and emotional intelligence base.
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.
Data Instances
{
'id': 10,
'input': 'Het gesprek tussen de mens en de AI-assistent.
'
'[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update '
'zal krijgen?
'
'[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de '
'Android 7.0 (Nougat) update.
'
'[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal '
'zijn?
'
'[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 '
'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de '
'update zal ontvangen.
'
'[|Human|] Is er nieuws of het binnenkort de update zal ontvangen?
'
'[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. '
'Het is het beste om uit te kijken naar updates van Motorola.
'
'[|Human|] ',
'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 '
'(Nougat) update?'
}
Data Fields
[1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]
Het gesprek tussen de mens en de AI-assistent.
and has at least one occurrence of both [|AI|]
and [|Human|]
.Dataset Creation
Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo
. max_tokens=1024, temperature=0
as parameters.
The prompt template to translate the input is:
CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.
Here are the requirements that you should adhere to:
1. maintain the format: the conversation consists of the AI (marked as `[|AI|]`) and the human (`[|Human|]`) talking in turns and responding to each other;
2. do not translate the speaker identifiers `[|AI|]` and `[|Human|]` but always copy them into the translation in appropriate places;
3. ensure accurate translation and keep the correctness of the conversation;
4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
5. translate the human's text using informal, but standard, language;
6. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang};
8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested;
9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.
Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.
"""
The prompt to translate the topic is:
TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct," \
" summarizing manner. Translate accurately and formally. Do not provide any explanation" \
" about the translation and do not include the original title.
"
The system message was:
You are a helpful assistant that translates English to Dutch to the requirements that are given to you.
Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|]
and/or the human one [|Human|]
. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]
.
The translation quality has not been verified. Use at your own risk!
Licensing Information
Licensing info for quora
is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.
This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo
), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.
If you use this dataset, you must also follow the Sharing and Usage policies.
As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.
This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A large non-factoid English consumer Question Answering (QA) dataset containing 51,000 pairs of consumer questions and their corresponding expert answers. This dataset is useful for bench-marking or training systems on more difficult real-world questions and responses which may contain spelling or formatting errors, or lexical gaps between consumer and expert vocabularies.
By downloading this dataset, you agree to have obtained ethics approval from your institution. Lineage: We collected data from posts and comments to subreddit /r/askdocs, published between July 10, 2013, and April 2, 2022, totalling 600,000 submissions (original posts) and 1,700,000 comments (replies). We generated question-answer pairs by taking the highest scoring answer from a verified medical expert to a Reddit question. Questions with only images are removed, all links are removed and authors are removed.
We provide two separate datasets in this collection and provide the following schemas. MedRedQA - Reddit Medical Question and Answer pairs from /r/askdocs. CSV format. i. the poster's question (Body) ii. Title of the post iii. The filtered answer from a verified physician comment (Response) iv. Occupation indicated for verification status v. Any PMCIDs found in the post
MedRedQA+PubMed - PubMed Enriched subset of MedRedQA. JSON format. i. Question. The user's original question. The is equivalent to the Body field in MedRedQA ii. Document: The abstract of the PubMed document (if it exists and contains an abstract) for that particular post. Note: it does not necessarily mean the answer references this document. But at least one other verified physician in the responses has mentioned that particular document. iii. The filtered response. This is equivalent to the Response field in MedRedQA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
InductiveQE datasets
UPD 2.0: Regenerated datasets free of potential test set leakages
UPD 1.1: Added train_answers_val.pkl files to all freebase-derived datasets - answers of training queries on larger validation graphs
This repository contains 10 inductive complex query answering datasets published in "Inductive Logical Query Answering in Knowledge Graphs" (NeurIPS 2022). 9 datasets (106-550) were created from FB15k-237, the wikikg dataset was created from OGB WikiKG 2 graph. In the datasets, all inference graphs extend training graphs and include new nodes and edges. Dataset numbers indicate a relative size of the inference graph compared to the training graph, e.g., in 175, the number of nodes in the inference graph is 175% compared to the number of nodes in the training graph. The higher the ratio, the more new unseen nodes appear at inference time, the more complex the task is. The Wikikg split has a fixed 133% ratio.
Each dataset is a zip archive containing 17 files:
Overall unzipped size of all datasets combined is about 10 GB. Please refer to the paper for the sizes of graphs and the number of queries per graph.
The Wikikg dataset is supposed to be evaluated in the inference-only regime being pre-trained solely on simple link prediction, the number of training complex queries is not enough for such a large dataset.
Paper pre-print: https://arxiv.org/abs/2210.08008
The full source code of training/inference models is available at https://github.com/DeepGraphLearning/InductiveQE
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
This dataset contains 51,712 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Alpaca Cleaned Dataset.
Data Instances
{
'id': 7,
'instruction': 'Leg uit waarom de volgende breuk gelijk is aan 1/4',
'input': '4/16',
'output': 'De breuk 4/16 is gelijk aan 1/4 omdat zowel de teller als de '
'noemer deelbaar zijn door 4. Door zowel de teller als de noemer '
'door 4 te delen, krijgen we de breuk 1/4.'
}
Data Fields
[23019]
Dataset Creation
The instructions, inputs and outputs were translated with OpenAI's API for gpt-3.5-turbo
. max_tokens=1024, temperature=0
as parameters.
The prompt template to translate is (where src_lang
is English and tgt_lang
is Dutch):
TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional input to the task, and the output of the task, from {src_lang} into {tgt_lang}.
Here are the requirements that you should adhere to:
1. maintain the format: the task consists of a task instruction (marked `instruction: `), optional input to the task (marked `input: `) and output for the task marked with `output: `;
2. do not translate the identifiers `instruction: `, `input: `, and `output: ` but instead copy them to your output;
3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
4. translate the instruction and input text using informal, but standard, language;
5. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the input in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang};
7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the input, nor the translation in the output (just copy them as-is);
8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English.
Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else.
"""
This prompt is concatenated with the instruction, optionally the input, and the output. In code, that last part looks like this:
text = f'instruction: "{instruction}"
'
if inputstr:
text += f'input: "{inputstr}"
'
text += f'output: "{outputstr}"'
The system message was:
You are a helpful assistant that translates English to Dutch to the requirements that are given to you.
Note that 1 item (0.0001%) was not successfully translated. The translation was missing the input, instruction, or output keywords where those were expected. The ID for the missing item is [23019]
.
Initial data creation of the English dataset by Tatsu lab and cleaned by Yahma.
Also available on HuggingFace hub (with a more extensive README).
Licensing Information
As per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services. Similar to the original Alpaca dataset, this dataset is released under CC NC 4.0.
This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo
), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.
If you use this dataset, you must also follow the Sharing and Usage policies.
As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance who have answered No to the question Do you like your assistants? divided by all people with personal assistance who have answered the question. The answer options were Yes, Sometimes, No. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The implications of the persistent gender gap in political knowledge are a puzzle that the literature is still disentangling; and research has evidenced important differences in the way women and men respond to survey questions. We argue in this article that political knowledge survey items not only inform about differences in cognition, but also about other latent traits related to gender stereotyping. Gender stereotypes around political knowledge push men to be knowledgeable but not so much women, which we expect to affect men and women’s survey responses differently. To test this expectation, we explore response times of don’t know answers to political knowledge items. Our results show that men, particularly those who declare being interested in politics, take longer than women to admit that they do not know the answer to political knowledge items.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books and is filtered where the book is The answer to bad religion is not no religion, featuring 5 columns: author, BNB id, book, book publisher, and publication date. The preview is ordered by publication date (descending).
In a series of 6 experiments, we examined how not responding to appetitive stimuli causes devaluation. To examine this question, a go/no-go task was employed in which appetitive stimuli were consistently associated with cues to respond (go stimuli), or with cues to not respond (either no-go cues or the absence of cues; no-go stimuli). Change in evaluation of go and no-go stimuli was compared to stimuli not presented in the task (untrained stimuli). Overall, the results suggest that devaluation of appetitive stimuli by not responding to them is the result of response inhibition. In these experiments we recruited a total of 272 participants via the Sona participation system at Radboud University. The experiments were conducted from November 2014 to September 2015. The dataset contains all the measurements from these 6 experiments. The analyses were conducted with SPSS 23.
The CommonsenseQA is a dataset for commonsense question answering task. The dataset consists of 12,247 questions with 5 choices each. The dataset was generated by Amazon Mechanical Turk workers in the following process (an example is provided in parentheses):
a crowd worker observes a source concept from ConceptNet (“River”) and three target concepts (“Waterfall”, “Bridge”, “Valley”) that are all related by the same ConceptNet relation (“AtLocation”), the worker authors three questions, one per target concept, such that only that particular target concept is the answer, while the other two distractor concepts are not, (“Where on a river can you hold a cup upright to catch water on a sunny day?”, “Where can I stand on a river to see water falling without getting wet?”, “I’m crossing the river, my feet are wet but my body is dry, where am I?”) for each question, another worker chooses one additional distractor from Concept Net (“pebble”, “stream”, “bank”), and the author another distractor (“mountain”, “bottom”, “island”) manually.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a development key figure, see questions and answers on kolada.se for more information. Number of people living in group housing LSS who have answered None to the question Do you feel safe with the staff? divided by all people living in group housing LSS who have answered the question. The answer options were Everyone, Some, None. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Answers to a survey on gold Open Access run from July to October 2016. The dataset contains 15,235 unique responses from Web of Science published authors. This survey is part of a PhD thesis from the University of Granada in Spain. More details about the study can be found in the full text document, also available in Zenodo.
Following are listed the questions related to the WoS 2016 dataset. Please note that countries with less than 40 answers are listed as "Other" in order to preserve anonymity.
* 1. How many years have you been employed in research?
Many of the questions that follow concern Open Access publishing. For the purposes of this survey, an article is Open Access if its final, peer-reviewed, version is published online by a journal and is free of charge to all users without restrictions on access or use.
* 2. Do any journals in your research field publish Open Access articles?
* 3. Do you think your research field benefits, or would benefit from journals that publish Open Access articles?
* 4. How many peer reviewed research articles (Open Access or not Open Access) have you published in the last five years?
* 5. What factors are important to you when selecting a journal to publish in?
[Each factor may be rated “Extremely important”, “Important”, “Less important” or “Irrelevant”. The factors are presented in random order.]
6. Who usually decides which journals your articles are submitted to? (Choose more than one answer if applicable)
7. Approximately how many Open Access articles have you published in the last five years?
[If the answer is “0”, the survey jumps to Q10.]
* 8. What publication fee was charged for the last Open Access article you published?
[If the answer is “No charge or I don’t know” the survey jumps to Q20. ]
* 9. How was this publication fee covered? (Choose more than one answer if applicable)
* 10. How easy is it to obtain funding if needed for Open Access publishing from your institution or the organisation mainly responsible for financing your research?
* 11. Listed below are a series of statements, both positive and negative, concerning Open Access publishing. Please indicate how strongly you agree/disagree with each statement.
[Each statement may be rated “Strongly agree”, “Agree”, “Neither agree nor disagree”, “Disagree” or “Strongly disagree”. The statements are presented in random order.]
This study and its questionnaire are based on the SOAP Project (http://project-soap.eu). An article describing the highlights of the SOAP Survey is available at: https://arxiv.org/abs/1101.5260. The dataset of the SOAP survey is available at http://bit.ly/gSmm71. A manual describing the SOAP dataset is available at http://bit.ly/gI8nc.
The Do-Not-Answer dataset is designed to test the safety performance of Large Language Models (LLMs).