76 datasets found
  1. t

    Do-Not-Answer dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Do-Not-Answer dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/do-not-answer-dataset
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    The Do-Not-Answer dataset is designed to test the safety performance of Large Language Models (LLMs).

  2. h

    do-not-answer-ja-creative-150

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    umiyuki (2025). do-not-answer-ja-creative-150 [Dataset]. https://huggingface.co/datasets/umiyuki/do-not-answer-ja-creative-150
    Explore at:
    Dataset updated
    Mar 26, 2025
    Authors
    umiyuki
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    kunishowさんが整備してくださってるdo-not-answer-jaからむしろ創作能力という意味では回答してくれた方が有用なカテゴリ5つからおおむね30問ずつ、合わせて150問を抽出したものです。 do-not-answer-ja https://github.com/kunishou/do-not-answer-ja do-not-answer https://github.com/Libr-AI/do-not-answer

  3. Z

    Dolly 15k Dutch

    • data.niaid.nih.gov
    • huggingface.co
    • +1more
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanroy, Bram (2023). Dolly 15k Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8054097
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset authored and provided by
    Vanroy, Bram
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This dataset contains 14,934 instructions, contexts and responses, in several natural language categories such as classification, closed QA, generation, etc. The English original dataset was created by @databricks, who crowd-sourced the data creation via its employees. The current dataset is a translation of that dataset through ChatGPT (gpt-3.5-turbo).

    Data Instances

    { "id": 14963, "instruction": "Wat zijn de duurste steden ter wereld?", "context": "", "response": "Dit is een uitgebreide lijst van de duurste steden: Singapore, Tel Aviv, New York, Hong Kong, Los Angeles, Zurich, Genève, San Francisco, Parijs en Sydney.", "category": "brainstorming" }

    Data Fields

    id: the ID of the item. The following 77 IDs are not included because they could not be translated (or were too long): [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966]

    instruction: the instruction (question)

    context: additional context that the AI can use to answer the question

    response: the AI's expected response

    category: the category of this type of question (see Dolly for more info)

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional context to the task, and the response to the task, from {src_lang} to {tgt_lang}.

    Here are the requirements that you should adhere to: 1. maintain the format: the task consists of a task instruction (marked instruction:), optional context to the task (marked context:) and response for the task marked with response:; 2. do not translate the identifiers instruction:, context:, and response: but instead copy them to your output; 3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 4. translate the instruction and context text using informal, but standard, language; 5. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the context in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang}; 7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the context, nor the translation in the response (just copy them as-is); 8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English.

    Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else.

    """

    The system message was:

    You are a helpful assistant that translates English to Dutch according to the requirements that are given to you.

    Note that 77 items (0.5%) were not successfully translated. This can either mean that the prompt was too long for the given limit (max_tokens=1024) or that the generated translation could not be parsed into instruction, context and response fields. The missing IDs are [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966].

    Initial Data Collection and Normalization

    Initial data collection by databricks. See their repository for more information about this dataset.

    Considerations for Using the Data

    Note that the translations in this new dataset have not been verified by humans! Use at your own risk, both in terms of quality and biases.

    Discussion of Biases

    As with any machine-generated texts, users should be aware of potential biases that are included in this dataset. Although the prompt specifically includes make sure to avoid biases (such as gender bias, grammatical bias, social bias), of course the impact of such command is not known. It is likely that biases remain in the dataset so use with caution.

    Other Known Limitations

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    This repository follows the original databricks license, which is CC BY-SA 3.0 but see below for a specific restriction.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub, its canonical repository.

  4. P

    BoolQ Dataset

    • paperswithcode.com
    Updated Dec 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova (2023). BoolQ Dataset [Dataset]. https://paperswithcode.com/dataset/boolq
    Explore at:
    Dataset updated
    Dec 13, 2023
    Authors
    Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova
    Description

    BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.

    Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.

  5. g

    User assessment Personal assistance – The user does not receive the help...

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    User assessment Personal assistance – The user does not receive the help he/she wants from the assistants, percentage (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-api-kolada-se-v2-kpi-u28621
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance answering No to the question Do you get the help you want from your assistants? divided by all people with personal assistance who have answered the question. The answer options were Yes, Sometimes, No. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.

  6. d

    HSIP E911 Public Safety Answering Point (PSAP)

    • catalog.data.gov
    • gstore.unm.edu
    • +3more
    Updated Dec 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2020). HSIP E911 Public Safety Answering Point (PSAP) [Dataset]. https://catalog.data.gov/dataset/hsip-e911-public-safety-answering-point-psap
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    (Point of Contact)
    Description

    911 Public Safety Answering Point (PSAP) service area boundaries in New Mexico According to the National Emergency Number Association (NENA), a Public Safety Answering Point (PSAP) is a facility equipped and staffed to receive 9-1-1 calls. The service area is the geographic area within which a 911 call placed using a landline is answered at the associated PSAP. This dataset only includes primary PSAPs. Secondary PSAPs, backup PSAPs, and wireless PSAPs have been excluded from this dataset. Primary PSAPs receive calls directly, whereas secondary PSAPs receive calls that have been transferred by a primary PSAP. Backup PSAPs provide service in cases where another PSAP is inoperable. Most military bases have their own emergency telephone systems. To connect to such system from within a military base it may be necessary to dial a number other than 9 1 1. Due to the sensitive nature of military installations, TGS did not actively research these systems. If civilian authorities in surrounding areas volunteered information about these systems or if adding a military PSAP was necessary to fill a hole in civilian provided data, TGS included it in this dataset. Otherwise military installations are depicted as being covered by one or more adjoining civilian emergency telephone systems. In some cases areas are covered by more than one PSAP boundary. In these cases, any of the applicable PSAPs may take a 911 call. Where a specific call is routed may depend on how busy the applicable PSAPS are (i.e. load balancing), operational status (i.e. redundancy), or time of date / day of week. If an area does not have 911 service, TGS included that area in the dataset along with the address and phone number of their dispatch center. These are areas where someone must dial a 7 or 10 digit number to get emergency services. These records can be identified by a "Y" in the [NON911EMNO] field. This indicates that dialing 911 inside one of these areas does not connect one with emergency services. This dataset was constructed by gathering information about PSAPs from state level officials. In some cases this was geospatial information, in others it was tabular. This information was supplemented with a list of PSAPs from the Federal Communications Commission (FCC). Each PSAP was researched to verify its tabular information. In cases where the source data was not geospatial, each PSAP was researched to determine its service area in terms of existing boundaries (e.g. city and county boundaries). In some cases existing boundaries had to be modified to reflect coverage areas (e.g. "entire county north of Country Road 30"). However, there may be cases where minor deviations from existing boundaries are not reflected in this dataset, such as the case where a particular PSAPs coverage area includes an entire county, and the homes and businesses along a road which is partly in another county. Text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. All diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics.

  7. w

    Subjects of The answer to bad religion is not no religion

    • workwithdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data, Subjects of The answer to bad religion is not no religion [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=The+answer+to+bad+religion+is+not+no+religion&j=1&j0=books
    Explore at:
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects and is filtered where the books is The answer to bad religion is not no religion, featuring 10 columns including authors, average publication date, book publishers, book subject, and books. The preview is ordered by number of books (descending).

  8. Z

    How solution snippets are presented in answers posted on Stack Overflow and...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2022). How solution snippets are presented in answers posted on Stack Overflow and how they could be potentially reused. [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5816299
    Explore at:
    Dataset updated
    Jan 5, 2022
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Researchers use datasets of Question-Solution pairs to train machine learning models, such as for source code generation. A Question-Solution pair contains two parts: a programming question and its corresponding Solution Snippet. A Solution Snippet is a source code that solves a programming question. These datasets of Question-Solution pairs can be obtained from a number of different platforms. In this study, the information regarding the Question-Solution pairs was specifically obtained from Stack Overflow (SO). However, there are two limitations of datasets of Question-Solution pairs extracted from SO: (1) the Solution Snippets are partially correct and/or the Solution Snippet do not answer the questions, and (2) the information regarding the potential aspects of reusability of Solution Snippets is not available. These limitations can adversely affect the predictability of a machine learning model. Therefore, I conducted an empirical study to categorize various presentations of Solution Snippet in SO answers as well as how Solution Snippets can be adapted for reuse. By doing so, I identified eight categories of how Solution Snippets are presented in SO answers and five categories of how Solution Snippets could be adapted. Based on these results, I concluded several potential reasons why it is not always easy to create datasets of Question-Solution pairs. The first categorization informs that finding the correct location of the Solution Snippet is challenging when several code blocks are within the answer to the question. Subsequently, the researcher must identify which code within that code block is the Solution Snippet. The second categorization informs that most Solution Snippets appear challenging to be adapted for reuse, and how they are potentially adapted is not explicitly stated in them. These insights shed light on how to create better quality datasets in order to improve the predictability of machine learning models.

  9. f

    Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li (2023). Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight Management: System Design and Finding.pdf [Dataset]. http://doi.org/10.3389/fnut.2022.870775.s004
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As the obesity rate continues to increase persistently, there is an urgent need to develop an effective weight loss management strategy. Nowadays, the development of artificial intelligence (AI) and cognitive technologies coupled with the rapid spread of messaging platforms and mobile technology with easier access to internet technology offers professional dietitians an opportunity to provide extensive monitoring support to their clients through a chatbot with artificial empathy. This study aimed to design a chatbot with artificial empathic motivational support for weight loss called “SlimMe” and investigate how people react to a diet bot. The SlimMe infrastructure was built using Dialogflow as the natural language processing (NLP) platform and LINE mobile messenger as the messaging platform. We proposed a text-based emotion analysis to simulate artificial empathy responses to recognize the user's emotion. A preliminary evaluation was performed to investigate the early-stage user experience after a 7-day simulation trial. The result revealed that having an artificially empathic diet bot for weight loss management is a fun and exciting experience. The use of emoticons, stickers, and GIF images makes the chatbot response more interactive. Moreover, the motivational support and persuasive messaging features enable the bot to express more empathic and engaging responses to the user. In total, there were 1,007 bot responses from 892 user input messages. Of these, 67.38% (601/1,007) of the chatbot-generated responses were accurate to a relevant user request, 21.19% (189/1,007) inaccurate responses to a relevant request, and 10.31% (92/1,007) accurate responses to an irrelevant request. Only 1.12% (10/1,007) of the chatbot does not answer. We present the design of an artificially empathic diet bot as a friendly assistant to help users estimate their calorie intake and calories burned in a more interactive and engaging way. To our knowledge, this is the first chatbot designed with artificial empathy features, and it looks very promising in promoting long-term weight management. More user interactions and further data training and validation enhancement will improve the bot's in-built knowledge base and emotional intelligence base.

  10. Quora Chat Dutch

    • zenodo.org
    • huggingface.co
    • +1more
    bin
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bram Vanroy; Bram Vanroy (2023). Quora Chat Dutch [Dataset]. http://doi.org/10.57967/hf/0524
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bram Vanroy; Bram Vanroy
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description

    This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.

    Data Instances

    {
     'id': 10,
     'input': 'Het gesprek tussen de mens en de AI-assistent.
    '
          '[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update '
          'zal krijgen?
    '
          '[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de '
          'Android 7.0 (Nougat) update.
    '
          '[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal '
          'zijn?
    '
          '[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 '
          'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de '
          'update zal ontvangen.
    '
          '[|Human|] Is er nieuws of het binnenkort de update zal ontvangen?
    '
          '[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. '
          'Het is het beste om uit te kijken naar updates van Motorola.
    '
          '[|Human|] ',
     'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 '
          '(Nougat) update?'
    }
    

    Data Fields

    • id: the ID of the item. The following IDs are not included because they could not be translated: [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]
    • input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].
    • topic: the topic description

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is:

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.
    
    Here are the requirements that you should adhere to:
    1. maintain the format: the conversation consists of the AI (marked as `[|AI|]`) and the human (`[|Human|]`) talking in turns and responding to each other;
    2. do not translate the speaker identifiers `[|AI|]` and `[|Human|]` but always copy them into the translation in appropriate places;
    3. ensure accurate translation and keep the correctness of the conversation;
    4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
    5. translate the human's text using informal, but standard, language;
    6. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
    7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang};
    8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested;
    9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.
    
    Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.
    
    """
    

    The prompt to translate the topic is:

    TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct," \
                  " summarizing manner. Translate accurately and formally. Do not provide any explanation" \
                  " about the translation and do not include the original title.
    
    "
    

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.
    

    Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740].

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    Licensing info for quora is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.

  11. MedRedQA

    • data.csiro.au
    • researchdata.edu.au
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Nguyen; Sarvnaz Karimi; Maciek Rybinski; Zhenchang Xing (2024). MedRedQA [Dataset]. http://doi.org/10.25919/yn7x-9148
    Explore at:
    Dataset updated
    May 1, 2024
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Vincent Nguyen; Sarvnaz Karimi; Maciek Rybinski; Zhenchang Xing
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Jul 10, 2013 - Apr 2, 2022
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Australian National University
    Description

    A large non-factoid English consumer Question Answering (QA) dataset containing 51,000 pairs of consumer questions and their corresponding expert answers. This dataset is useful for bench-marking or training systems on more difficult real-world questions and responses which may contain spelling or formatting errors, or lexical gaps between consumer and expert vocabularies.

    By downloading this dataset, you agree to have obtained ethics approval from your institution. Lineage: We collected data from posts and comments to subreddit /r/askdocs, published between July 10, 2013, and April 2, 2022, totalling 600,000 submissions (original posts) and 1,700,000 comments (replies). We generated question-answer pairs by taking the highest scoring answer from a verified medical expert to a Reddit question. Questions with only images are removed, all links are removed and authors are removed.

    We provide two separate datasets in this collection and provide the following schemas. MedRedQA - Reddit Medical Question and Answer pairs from /r/askdocs. CSV format. i. the poster's question (Body) ii. Title of the post iii. The filtered answer from a verified physician comment (Response) iv. Occupation indicated for verification status v. Any PMCIDs found in the post

    MedRedQA+PubMed - PubMed Enriched subset of MedRedQA. JSON format. i. Question. The user's original question. The is equivalent to the Body field in MedRedQA ii. Document: The abstract of the PubMed document (if it exists and contains an abstract) for that particular post. Note: it does not necessarily mean the answer references this document. But at least one other verified physician in the responses has mentioned that particular document. iii. The filtered response. This is equivalent to the Response field in MedRedQA.

  12. InductiveQE Datasets

    • zenodo.org
    zip
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Galkin; Mikhail Galkin (2022). InductiveQE Datasets [Dataset]. http://doi.org/10.5281/zenodo.7306046
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mikhail Galkin; Mikhail Galkin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    InductiveQE datasets

    UPD 2.0: Regenerated datasets free of potential test set leakages

    UPD 1.1: Added train_answers_val.pkl files to all freebase-derived datasets - answers of training queries on larger validation graphs

    This repository contains 10 inductive complex query answering datasets published in "Inductive Logical Query Answering in Knowledge Graphs" (NeurIPS 2022). 9 datasets (106-550) were created from FB15k-237, the wikikg dataset was created from OGB WikiKG 2 graph. In the datasets, all inference graphs extend training graphs and include new nodes and edges. Dataset numbers indicate a relative size of the inference graph compared to the training graph, e.g., in 175, the number of nodes in the inference graph is 175% compared to the number of nodes in the training graph. The higher the ratio, the more new unseen nodes appear at inference time, the more complex the task is. The Wikikg split has a fixed 133% ratio.

    Each dataset is a zip archive containing 17 files:

    • train_graph.txt (pt for wikikg) - original training graph
    • val_inference.txt (pt) - inference graph (validation split), new nodes in validation are disjoint with the test inference graph
    • val_predict.txt (pt) - missing edges in the validation inference graph to be predicted.
    • test_intference.txt (pt) - inference graph (test splits), new nodes in test are disjoint with the validation inference graph
    • test_predict.txt (pt) - missing edges in the test inference graph to be predicted.
    • train/valid/test_queries.pkl - queries of the respective split, 14 query types for fb-derived datasets, 9 types for Wikikg (EPFO-only)
    • *_answers_easy.pkl - easy answers to respective queries that do not require predicting missing links but only edge traversal
    • *_answers_hard.pkl - hard answers to respective queries that DO require predicting missing links and against which the final metrics will be computed
    • train_answers_val.pkl - the extended set of answers for training queries on the bigger validation graph, most of training queries have at least 1 more new answers. This is supposed to be an inference-only dataset to measure faithfulness of trained models
    • train_answers_test.pkl - the extended set of answers for training queries on the bigger test graph, most of training queries have at least 1 more new answers. This is supposed to be an inference-only dataset to measure faithfulness of trained models
    • og_mappings.pkl - contains entity2id / relation2id dictionaries mapping local node/relation IDs from a respective dataset to the original fb15k237 / wikikg2
    • stats.txt - a small file with dataset stats

    Overall unzipped size of all datasets combined is about 10 GB. Please refer to the paper for the sizes of graphs and the number of queries per graph.

    The Wikikg dataset is supposed to be evaluated in the inference-only regime being pre-trained solely on simple link prediction, the number of training complex queries is not enough for such a large dataset.

    Paper pre-print: https://arxiv.org/abs/2210.08008

    The full source code of training/inference models is available at https://github.com/DeepGraphLearning/InductiveQE

  13. Alpaca Cleaned Dutch

    • zenodo.org
    • huggingface.co
    • +1more
    bin
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bram Vanroy; Bram Vanroy (2023). Alpaca Cleaned Dutch [Dataset]. http://doi.org/10.57967/hf/0530
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bram Vanroy; Bram Vanroy
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Description

    This dataset contains 51,712 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Alpaca Cleaned Dataset.

    Data Instances

    {
      'id': 7,
      'instruction': 'Leg uit waarom de volgende breuk gelijk is aan 1/4',
      'input': '4/16',
      'output': 'De breuk 4/16 is gelijk aan 1/4 omdat zowel de teller als de '
           'noemer deelbaar zijn door 4. Door zowel de teller als de noemer '
           'door 4 te delen, krijgen we de breuk 1/4.'
    }
    

    Data Fields

    • id: the ID of the item. The following ID is not included because they could not be translated: [23019]
    • instruction: the given instruction input: optional input to accompany the instruction. Can be empty.
    • output: the "answer" to the instruction

    Dataset Creation

    The instructions, inputs and outputs were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate is (where src_lang is English and tgt_lang is Dutch):

    TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional input to the task, and the output of the task, from {src_lang} into {tgt_lang}.
    
    Here are the requirements that you should adhere to:
    1. maintain the format: the task consists of a task instruction (marked `instruction: `), optional input to the task (marked `input: `) and output for the task marked with `output: `;
    2. do not translate the identifiers `instruction: `, `input: `, and `output: ` but instead copy them to your output;
    3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
    4. translate the instruction and input text using informal, but standard, language;
    5. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
    6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the input in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang};
    7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the input, nor the translation in the output (just copy them as-is);
    8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English.
    
    Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else.
    
    """
    

    This prompt is concatenated with the instruction, optionally the input, and the output. In code, that last part looks like this:

    text = f'instruction: "{instruction}"
    
    '
    if inputstr:
      text += f'input: "{inputstr}"
    
    '
    text += f'output: "{outputstr}"'
    

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.
    

    Note that 1 item (0.0001%) was not successfully translated. The translation was missing the input, instruction, or output keywords where those were expected. The ID for the missing item is [23019].

    Initial data creation of the English dataset by Tatsu lab and cleaned by Yahma.

    Also available on HuggingFace hub (with a more extensive README).

    Licensing Information

    As per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services. Similar to the original Alpaca dataset, this dataset is released under CC NC 4.0.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

  14. g

    User assessment Personal assistance – The user is not happy with his...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    User assessment Personal assistance – The user is not happy with his assistants, percentage (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-api-kolada-se-v2-kpi-u28640
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a development key figure, see questions and answers on kolada.se for more information. Number of people with Personal Assistance who have answered No to the question Do you like your assistants? divided by all people with personal assistance who have answered the question. The answer options were Yes, Sometimes, No. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.

  15. H

    Replication Data for: Ferrín, Mónica, García-Albacete, Gema & Irene...

    • dataverse.harvard.edu
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monica Ferrin (2022). Replication Data for: Ferrín, Mónica, García-Albacete, Gema & Irene Sánchez-Vítores, How long does it take to admit that you don't know? Gender differences in response time to political knowledge questions [Dataset]. http://doi.org/10.7910/DVN/GH8HIP
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Monica Ferrin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The implications of the persistent gender gap in political knowledge are a puzzle that the literature is still disentangling; and research has evidenced important differences in the way women and men respond to survey questions. We argue in this article that political knowledge survey items not only inform about differences in cognition, but also about other latent traits related to gender stereotyping. Gender stereotypes around political knowledge push men to be knowledgeable but not so much women, which we expect to affect men and women’s survey responses differently. To test this expectation, we explore response times of don’t know answers to political knowledge items. Our results show that men, particularly those who declare being interested in politics, take longer than women to admit that they do not know the answer to political knowledge items.

  16. w

    Author, BNB id, book publisher and publication date of books called The...

    • workwithdata.com
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Author, BNB id, book publisher and publication date of books called The answer to bad religion is not no religion [Dataset]. https://www.workwithdata.com/datasets/books?col=author%2Cbnb_id%2Cbook%2Cbook%2Cbook_publisher%2Cbook_publisher_url%2Cbook_subject%2Cpublication_date&f=1&fcol0=book&fop0=%3D&fval0=The+answer+to+bad+religion+is+not+no+religion
    Explore at:
    Dataset updated
    Feb 20, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books and is filtered where the book is The answer to bad religion is not no religion, featuring 5 columns: author, BNB id, book, book publisher, and publication date. The preview is ordered by publication date (descending).

  17. g

    Data from: How Does Not Responding to Appetitive Stimuli Cause Devaluation:...

    • datasearch.gesis.org
    • ssh.datastations.nl
    • +1more
    Updated Jan 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, MSc Z. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/401559610; Veling, dr. H.P. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/298978520; Dijksterhuis, prof. dr. A.J. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/153149493; Holland, prof. dr. R.W. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/202163954 (2020). How Does Not Responding to Appetitive Stimuli Cause Devaluation: Evaluative Conditioning or Response Inhibition? [Dataset]. http://doi.org/10.17026/dans-xfv-at9w
    Explore at:
    Dataset updated
    Jan 23, 2020
    Dataset provided by
    DANS (Data Archiving and Networked Services)
    Authors
    Chen, MSc Z. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/401559610; Veling, dr. H.P. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/298978520; Dijksterhuis, prof. dr. A.J. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/153149493; Holland, prof. dr. R.W. (RU Radboud Universiteit) DAI=info:eu-repo/dai/nl/202163954
    Description

    In a series of 6 experiments, we examined how not responding to appetitive stimuli causes devaluation. To examine this question, a go/no-go task was employed in which appetitive stimuli were consistently associated with cues to respond (go stimuli), or with cues to not respond (either no-go cues or the absence of cues; no-go stimuli). Change in evaluation of go and no-go stimuli was compared to stimuli not presented in the task (untrained stimuli). Overall, the results suggest that devaluation of appetitive stimuli by not responding to them is the result of response inhibition. In these experiments we recruited a total of 272 participants via the Sona participation system at Radboud University. The experiments were conducted from November 2014 to September 2015. The dataset contains all the measurements from these 6 experiments. The analyses were conducted with SPSS 23.

  18. P

    CommonsenseQA Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Jul 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CommonsenseQA Dataset [Dataset]. https://paperswithcode.com/dataset/commonsenseqa
    Explore at:
    Dataset updated
    Jul 17, 2023
    Authors
    Alon Talmor; Jonathan Herzig; Nicholas Lourie; Jonathan Berant
    Description

    The CommonsenseQA is a dataset for commonsense question answering task. The dataset consists of 12,247 questions with 5 choices each. The dataset was generated by Amazon Mechanical Turk workers in the following process (an example is provided in parentheses):

    a crowd worker observes a source concept from ConceptNet (“River”) and three target concepts (“Waterfall”, “Bridge”, “Valley”) that are all related by the same ConceptNet relation (“AtLocation”), the worker authors three questions, one per target concept, such that only that particular target concept is the answer, while the other two distractor concepts are not, (“Where on a river can you hold a cup upright to catch water on a sunny day?”, “Where can I stand on a river to see water falling without getting wet?”, “I’m crossing the river, my feet are wet but my body is dry, where am I?”) for each question, another worker chooses one additional distractor from Concept Net (“pebble”, “stream”, “bank”), and the author another distractor (“mountain”, “bottom”, “island”) manually.

  19. g

    User assessment group housing LSS – The user does not feel safe with anyone...

    • gimi9.com
    Updated Apr 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). User assessment group housing LSS – The user does not feel safe with anyone in staff, share (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_http-api-kolada-se-v2-kpi-u28583
    Explore at:
    Dataset updated
    Apr 25, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is a development key figure, see questions and answers on kolada.se for more information. Number of people living in group housing LSS who have answered None to the question Do you feel safe with the staff? divided by all people living in group housing LSS who have answered the question. The answer options were Everyone, Some, None. The survey is not a total survey why the result for a municipality may be based on a smaller number of users’ answers, but at least five. For some municipalities, users are included in both the municipality’s own and other directories (private/ideal), for some only users on their own and for others only users on a different direction. The survey has been conducted with a web-based tool for surveys, adapted to people with disabilities. Data is available according to gender breakdown.

  20. Drivers and Barriers for Open Access Publishing - WoS 2016 Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Ruiz-Perez; Sergio Ruiz-Perez (2020). Drivers and Barriers for Open Access Publishing - WoS 2016 Dataset [Dataset]. http://doi.org/10.5281/zenodo.842013
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sergio Ruiz-Perez; Sergio Ruiz-Perez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Answers to a survey on gold Open Access run from July to October 2016. The dataset contains 15,235 unique responses from Web of Science published authors. This survey is part of a PhD thesis from the University of Granada in Spain. More details about the study can be found in the full text document, also available in Zenodo.

    Following are listed the questions related to the WoS 2016 dataset. Please note that countries with less than 40 answers are listed as "Other" in order to preserve anonymity.

    * 1. How many years have you been employed in research?

    • Fewer than 5 years
    • 5-14 years
    • 15-24 years
    • 25 years or longer

    Many of the questions that follow concern Open Access publishing. For the purposes of this survey, an article is Open Access if its final, peer-reviewed, version is published online by a journal and is free of charge to all users without restrictions on access or use.

    * 2. Do any journals in your research field publish Open Access articles?

    • Yes
    • No
    • I do not know

    * 3. Do you think your research field benefits, or would benefit from journals that publish Open Access articles?

    • Yes
    • No
    • I have no opinion
    • I do not care

    * 4. How many peer reviewed research articles (Open Access or not Open Access) have you published in the last five years?

    • 1-5
    • 6-10
    • 11-20
    • 21-50
    • More than 50

    * 5. What factors are important to you when selecting a journal to publish in?

    [Each factor may be rated “Extremely important”, “Important”, “Less important” or “Irrelevant”. The factors are presented in random order.]

    • Importance of the journal for academic promotion, tenure or assessment
    • Recommendation of the journal by my colleagues
    • Positive experience with publisher/editor(s) of the journal
    • The journal is an Open Access journal
    • Relevance of the journal for my community
    • The journal fits the policy of my organisation
    • Prestige/perceived quality of the journal
    • Likelihood of article acceptance in the journal
    • Absence of journal publication fees (e.g. submission charges, page charges, colour charges)
    • Copyright policy of the journal
    • Journal Impact Factor
    • Speed of publication of the journal

    6. Who usually decides which journals your articles are submitted to? (Choose more than one answer if applicable)

    • The decision is my own
    • A collective decision is made with my fellow authors
    • I am advised where to publish by a senior colleague
    • The organisation that finances my research advises me where to publish
    • Other (please specify) [Text box follows]

    7. Approximately how many Open Access articles have you published in the last five years?

    • 0
    • 1-5
    • 6-10
    • More than 10
    • I do not know

    [If the answer is “0”, the survey jumps to Q10.]

    * 8. What publication fee was charged for the last Open Access article you published?

    • No charge
    • Up to €250 ($275)
    • €251-€500 ($275-$550)
    • €501-€1000 ($551-$1100)
    • €1001-€3000 ($1101-$3300)
    • More than €3000 ($3300)
    • I do not know

    [If the answer is “No charge or I don’t know” the survey jumps to Q20. ]

    * 9. How was this publication fee covered? (Choose more than one answer if applicable)

    • My research funding includes money for paying such fees
    • I used part of my research funding not specifically intended for paying such fees
    • My institution paid the fees
    • I paid the costs myself
    • Other (please specify) [Text box follows]

    * 10. How easy is it to obtain funding if needed for Open Access publishing from your institution or the organisation mainly responsible for financing your research?

    • Easy
    • Difficult
    • I have not used these sources

    * 11. Listed below are a series of statements, both positive and negative, concerning Open Access publishing. Please indicate how strongly you agree/disagree with each statement.

    [Each statement may be rated “Strongly agree”, “Agree”, “Neither agree nor disagree”, “Disagree” or “Strongly disagree”. The statements are presented in random order.]

    • Researchers should retain the rights to their published work and allow it to be used by others
    • Open Access publishing undermines the system of peer review
    • Open Access publishing leads to an increase in the publication of poor quality research
    • If authors pay publication fees to make their articles Open Access, there will be less money available for research
    • It is not beneficial for the general public to have access to published scientific and medical articles
    • Open Access unfairly penalises research-intensive institutions with large publication output by making them pay high costs for publication
    • Publicly-funded research should be made available to be read and used without access barrier
    • Open Access publishing is more cost-effective than subscription-based publishing and so will benefit public investment in research
    • Articles that are available by Open Access are likely to be read and cited more often than those not Open Access

    This study and its questionnaire are based on the SOAP Project (http://project-soap.eu). An article describing the highlights of the SOAP Survey is available at: https://arxiv.org/abs/1101.5260. The dataset of the SOAP survey is available at http://bit.ly/gSmm71. A manual describing the SOAP dataset is available at http://bit.ly/gI8nc.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Do-Not-Answer dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/do-not-answer-dataset

Do-Not-Answer dataset - Dataset - LDM

Explore at:
Dataset updated
Dec 2, 2024
Description

The Do-Not-Answer dataset is designed to test the safety performance of Large Language Models (LLMs).

Search
Clear search
Close search
Google apps
Main menu