13 datasets found
  1. h

    awesome-chatgpt-prompts

    • huggingface.co
    Updated Dec 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2023
    Authors
    Fatih Kadir Akın
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    🧠 Awesome ChatGPT Prompts [CSV dataset]

    This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

      License
    

    CC-0

  2. Z

    Stack Overflow Chat Dutch

    • data.niaid.nih.gov
    Updated Jun 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vanroy, Bram (2023). Stack Overflow Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052379
    Explore at:
    Dataset updated
    Jun 20, 2023
    Dataset authored and provided by
    Vanroy, Bram
    License

    http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0

    Description

    This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.

    Data Instances

    {'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},

    Data Fields

    id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]

    input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

    topic: the topic description

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

    Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

    Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

    """

    The prompt to translate the topic is:

    TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
    " summarizing manner. Translate accurately and formally. Do not provide any explanation"
    " about the translation and do not include the original title.

    "

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

    Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.

  3. h

    Bitext-customer-support-llm-chatbot-training-dataset

    • huggingface.co
    • opendatalab.com
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

  4. Scam Conversation Corpus

    • zenodo.org
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Scam Conversation Corpus [Dataset]. http://doi.org/10.5281/zenodo.15212527
    Explore at:
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    Description

    Dataset Description

    Overview

    This dataset contains JSON files representing conversations with scammers, facilitated by GPT-4o, alongside a structured media folder for multimedia content. Each conversation is uniquely identified and may span multiple communication platforms.

    Data Structure

    JSON Files

    Each JSON file represents a single conversation and contains:

    • unique_id: Unique identifier for the conversation.
    • scam_ids: Dictionary mapping communication mediums to scammer identifiers (anonymized email addresses, usernames, etc.).
    • victim_ids: Dictionary mapping communication mediums to victim identifiers (anonymized email addresses, usernames, etc.).
    • pause_start, pause_end: Hours marking the local time in 0-24h format on the collection servers (where the response application ran), indicating periods where the application intentionally did not respond to simulate human sleep patterns.
    • victim_name: The name of the victim (for context in simulations).
    • messages: List of message objects with:
      • is_inbound: Boolean, true if the message is from the scammer to the victim.
      • from: Sender's identifier.
      • to: Receiver's identifier.
      • subject: (Email only) Subject line.
      • time: Timestamp (UNIX format).
      • medium: Communication platform (Email, Telegram, or Instagram).
      • multimedia: Boolean, true if the message contains multimedia.
      • body: Text content of the message.

    Note: Only structured fields like scammer/victim identifiers are anonymized — message content remains unaltered.

    Multimedia Descriptions:

    If a message contains multimedia content, descriptions of the received files are appended to the body field. The appended text follows this format:
    This message contains files. If the description for a file does not make sense,
    ignore it.
    Here are descriptions of those files:
    Description for file 1: example_description_1
    Description for file 2: example_description_2

    The description also can contain information about the file not being interpreted or not being able to be opened.

    Media Folder Structure

    Multimedia content is organized by platform and conversation ID. A conversation has a corresponding media folder only if it contains multimedia.


    media
    |-- email
    | |--
    | |--
    |-- ig
    | |--
    | |--
    |-- tg
    |--
    |--

    • Email and Telegram: Files are named message_timestamp_documentname.
    • Instagram: Files are named only by message_timestamp, as Instagram doesn’t support multi-file messages.
    • Each medium folder contains subfolders named after unique_conversation_id.

    Cross-Medium Conversations

    Conversations may span multiple mediums (e.g., starting via Email and continuing on Telegram). Each message tracks its originating medium.

    Data Quality Notes

    • Fragmented Conversations: Some messages may belong to the same logical conversation but are split into separate conversations. This happens when a scammer contacts the same victim ID (e.g., email address) from a different, previously unknown email address without prior notification.
    • Medium Switch Anomalies: Due to imperfect prompt handling, medium switches may appear unnatural. For example, the LLM might state it will switch to Telegram shortly, while the message stating this is already on Telegram.
    • Synthetic Personal Data: All personal data provided by the victim is fabricated — generated either by the authors or the LLM. However, coincidences may lead to data resembling real-world entities. Handle with caution.
    • Incomplete Messenger Representations: Some interactions, like calls on Instagram or Telegram, cannot be fully represented. These will appear as messages with "body": "null".

    Usage Restrictions

    This dataset is strictly for research purposes only.

  5. r

    Scenario ontology generation with OpenAI GPT-4

    • researchdata.se
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karl Kindström Andersson (2024). Scenario ontology generation with OpenAI GPT-4 [Dataset]. http://doi.org/10.5878/7bzb-mz52
    Explore at:
    (516548), (676200), (2279941)Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Linköping University
    Authors
    Karl Kindström Andersson
    Description

    This dataset is separated into three parts. The text and images in all documents have been manually copied from the OpenAI webpage, and the formating has been recreated to the greatest extent.

    1. DATASET1: Conversation history from Scenario Ontology Generation with OpenAI GPT-4 This dataset contains the conversation history where OpenAI GPT-4 is asked to analyze a text, is asked to create an ontology and then visualize it in a graph. The dataset contains all user input and the OpenAI GPT-4 output, including the graphical representations of the ontology. The dataset also contains a JSON file that was generated that describes the ontology.

    The document contains a conversation log with OpenAI GPT-4 where a text is analyzed by the AI and an ontology graph and a JSON file describing the ontology is created.

    1. DATASET2: Conversation history from Scenario Generation with OpenAI GPT-4 This dataset contains the conversation history where OpenAI GPT-4 is asked to generate a hypothetical scenario for the development of a new hypothetical multi-role tactical airlift. The dataset contains all user input and the OpenAI GPT-4 output, including the graphical representations of the ontology.

    The document contains conversation log with OpenAI GPT-4 where a hypothetical scenario is generated.

    1. DATASET3: CHAMP Operational Scenario generated from OpenAI GPT-4 OpenAI GPT-4 was used to generate a hypothetical scenario for the development of a hypothetical new multi-role airlift.

    The document contains a hypothetical scenario that has been generated by OpenAI GPT-4

    The dataset was originally published in DiVA and moved to SND in 2024.

  6. r

    Conversation history from Scenario Generation with OpenAI GPT-4

    • researchdata.se
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linköping University (2024). Conversation history from Scenario Generation with OpenAI GPT-4 [Dataset]. http://doi.org/10.48360/yykt-r359
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Linköping University
    Description

    This dataset contains the conversation history where OpenAI GPT-4 is asked to generate a hypothetical scenario for the development of a new hypothetical multi-role tactical airlift. The dataset contains all user input and the OpenAI GPT-4 output, including the graphical representations of the ontology. The generated word-file, containing the generated scenario, is stored in a separate dataset: "CHAMP Operational Scenario generated from OpenAI GPT-4"

    Document containing conversation log with OpenAI GPT-4 where a hypothetical scenario is generated. The text and images have been manually copied from the OpenAI webpage, and the formating has been recreated to the greatest extent

    License: Creative Commons Attribution 4.0 International

    The dataset was originally published in DiVA and moved to SND in 2024.

  7. Quora Chat Dutch

    • zenodo.org
    • huggingface.co
    bin
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bram Vanroy; Bram Vanroy (2023). Quora Chat Dutch [Dataset]. http://doi.org/10.57967/hf/0524
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bram Vanroy; Bram Vanroy
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description

    This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.

    Data Instances

    {
     'id': 10,
     'input': 'Het gesprek tussen de mens en de AI-assistent.
    '
          '[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update '
          'zal krijgen?
    '
          '[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de '
          'Android 7.0 (Nougat) update.
    '
          '[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal '
          'zijn?
    '
          '[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 '
          'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de '
          'update zal ontvangen.
    '
          '[|Human|] Is er nieuws of het binnenkort de update zal ontvangen?
    '
          '[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. '
          'Het is het beste om uit te kijken naar updates van Motorola.
    '
          '[|Human|] ',
     'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 '
          '(Nougat) update?'
    }
    

    Data Fields

    • id: the ID of the item. The following IDs are not included because they could not be translated: [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]
    • input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].
    • topic: the topic description

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is:

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.
    
    Here are the requirements that you should adhere to:
    1. maintain the format: the conversation consists of the AI (marked as `[|AI|]`) and the human (`[|Human|]`) talking in turns and responding to each other;
    2. do not translate the speaker identifiers `[|AI|]` and `[|Human|]` but always copy them into the translation in appropriate places;
    3. ensure accurate translation and keep the correctness of the conversation;
    4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
    5. translate the human's text using informal, but standard, language;
    6. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
    7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang};
    8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested;
    9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.
    
    Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.
    
    """
    

    The prompt to translate the topic is:

    TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct," \
                  " summarizing manner. Translate accurately and formally. Do not provide any explanation" \
                  " about the translation and do not include the original title.
    
    "
    

    The system message was:

    You are a helpful assistant that translates English to Dutch to the requirements that are given to you.
    

    Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740].

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    Licensing info for quora is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.

  8. h

    toxic-chat

    • huggingface.co
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Large Model Systems Organization (2024). toxic-chat [Dataset]. https://huggingface.co/datasets/lmsys/toxic-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Dataset authored and provided by
    Large Model Systems Organization
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Update

    [01/31/2024] We update the OpenAI Moderation API results for ToxicChat (0124) based on their updated moderation model on on Jan 25, 2024.[01/28/2024] We release an official T5-Large model trained on ToxicChat (toxicchat0124). Go and check it for you baseline comparision![01/19/2024] We have a new version of ToxicChat (toxicchat0124)!

      Content
    

    This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna online demo. We utilize a human-AI… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/toxic-chat.

  9. h

    sales-conversations

    • huggingface.co
    Updated Sep 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ENGEL (2023). sales-conversations [Dataset]. https://huggingface.co/datasets/goendalf666/sales-conversations
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 28, 2023
    Authors
    ENGEL
    Description

    Dataset Card for "sales-conversations"

    This dataset was created for the purpose of training a sales agent chatbot that can convince people. The initial idea came from: textbooks is all you need https://arxiv.org/abs/2306.11644 gpt-3.5-turbo was used for the generation

      Structure
    

    The conversations have a customer and a salesman which appear always in changing order. customer, salesman, customer, salesman, etc. The customer always starts the conversation Who ends the… See the full description on the dataset page: https://huggingface.co/datasets/goendalf666/sales-conversations.

  10. h

    ShareGPT-Unfiltered-RedPajama-Chat-format

    • huggingface.co
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fredi (2023). ShareGPT-Unfiltered-RedPajama-Chat-format [Dataset]. https://huggingface.co/datasets/Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2023
    Authors
    Fredi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    ShareGPT unfiltered dataset in RedPajama-Chat format

    This dataset was created by converting The alpaca-lora formatted ShareGPT dataset to the format required by RedPajama-Chat. This script was used for the conversion: https://github.com/fredi-python/Alpaca2INCITE-Dataset-Converter/blob/main/convert.py WARNING: Only the first human and gpt text of each conversation from the original dataset is included in the dataset.

      The format
    

    {"text": "

  11. h

    ChatGPT-Jailbreak-Prompts

    • huggingface.co
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubén Darío Jaramillo Romero (2023). ChatGPT-Jailbreak-Prompts [Dataset]. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2023
    Authors
    Rubén Darío Jaramillo Romero
    Description

    Dataset Card for Dataset Name

      Name
    

    ChatGPT Jailbreak Prompts

      Dataset Summary
    

    ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.

      Languages
    

    [English]

  12. h

    math

    • huggingface.co
    Updated Apr 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CAMEL-AI.org (2023). math [Dataset]. https://huggingface.co/datasets/camel-ai/math
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    CAMEL-AI.org
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society

    Github: https://github.com/lightaime/camel Website: https://www.camel-ai.org/ Arxiv Paper: https://arxiv.org/abs/2303.17760

      Dataset Summary
    

    Math dataset is composed of 50K problem-solution pairs obtained using GPT-4. The dataset problem-solutions pairs generating from 25 math topics, 25 subtopics for each topic and 80 problems for each "topic,subtopic" pairs. We provide the data… See the full description on the dataset page: https://huggingface.co/datasets/camel-ai/math.

  13. WildChat-nontoxic

    • huggingface.co
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WildChat-nontoxic [Dataset]. https://huggingface.co/datasets/allenai/WildChat-nontoxic
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    Description

    Dataset Card for WildChat-nontoxic

      Note: a newer version with 1 million conversations and demographic information can be found here.
    
    
    
    
    
      Dataset Description
    

    Paper: https://wenting-zhao.github.io/papers/wildchat.pdf

    License: https://allenai.org/licenses/impact-lr

    Language(s) (NLP): multi-lingual

    Point of Contact: Yuntian Deng

      Dataset Summary
    

    WildChat-nontoxic is the nontoxic subset of the WildChat dataset, a collection of 530K conversations… See the full description on the dataset page: https://huggingface.co/datasets/allenai/WildChat-nontoxic.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts

awesome-chatgpt-prompts

fka/awesome-chatgpt-prompts

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Description

🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

  License

CC-0

Search
Clear search
Close search
Google apps
Main menu