13 datasets found

h
awesome-chatgpt-prompts
huggingface.co
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fatih Kadir Akın (2023). awesome-chatgpt-prompts [Dataset]. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Authors
Fatih Kadir Akın
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
🧠 Awesome ChatGPT Prompts [CSV dataset]

This is a Dataset Repository of Awesome ChatGPT Prompts View All Prompts on GitHub

License

CC-0
Z
Stack Overflow Chat Dutch
data.niaid.nih.gov
Updated Jun 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vanroy, Bram (2023). Stack Overflow Chat Dutch [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8052379
Explore at:
Dataset updated
Jun 20, 2023
Dataset authored and provided by
Vanroy, Bram
License
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description
This dataset contains 56,964 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch, specifically in the domain of programming (Stack Overflow). They are translations of Baize's machine-generated answers to the Stack Overflow dataset.

Data Instances

{'id': 9, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik heb wat hulp nodig met Spark SQL union en het ordenen ' 'van kolommen. ' '[|AI|] Natuurlijk, ik help je graag. Waar heb je specifiek hulp ' 'bij nodig? ' '[|Human|] Ik wil de output van een UNION-query ordenen op een ' 'kolom die niet aanwezig is in de SELECT-instructie. Is dat ' 'mogelijk? ' '[|AI|] Ja, het is mogelijk met Spark SQL. Je kunt de ordinale ' 'positie van de kolom in de ORDER BY-clausule gebruiken in plaats ' 'van de kolomnaam. Als je bijvoorbeeld wilt ordenen op de derde ' 'kolom in de SELECT-instructie, kun je ORDER BY 3 gebruiken. ' '[|Human|] Dat is nuttig. Kan ik ook een alias gebruiken in de ' 'ORDER BY-clausule? ' '[|AI|] Ja, je kunt een alias gebruiken in de ORDER BY-clausule. De ' 'alias moet worden gedefinieerd in de SELECT-instructie. Als je ' 'bijvoorbeeld de derde kolom als "col3" alias geeft, kun je ordenen ' 'met ORDER BY col3. ' '[|Human|] Oké, ik denk dat ik het begrijp. Kan ik dezelfde alias ' 'gebruiken voor kolommen uit beide tabellen in de UNION? ' '[|AI|] Nee, je kunt dezelfde alias niet gebruiken voor kolommen ' 'uit beide tabellen in de UNION. Elke tabel in de UNION moet een ' 'unieke alias hebben voor de kolommen. ' '[|Human|] ', 'topic': 'Spark SQL UNION - ORDER BY kolom niet in SELECT'},

Data Fields

id: the ID of the item. The following 82 IDs are not included because they could not be translated: [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992]

input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

topic: the topic description

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}.

Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as [|AI|]) and the human ([|Human|]) talking in turns and responding to each other; 2. do not translate the speaker identifiers [|AI|] and [|Human|] but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English.

Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else.

"""

The prompt to translate the topic is:

TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct,"
" summarizing manner. Translate accurately and formally. Do not provide any explanation"
" about the translation and do not include the original title.

"

The system message was:

You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

Note that 82 items (0.1%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1713, 1937, 1960, 4326, 4356, 8357, 8542, 8827, 9137, 9782, 11560, 11961, 12244, 12362, 12488, 13259, 13621, 14445, 14835, 15006, 17746, 18808, 19285, 19426, 19491, 21270, 21661, 22098, 23352, 23840, 23869, 25148, 25928, 27102, 27856, 28387, 29942, 30041, 30251, 32396, 32742, 32941, 33628, 34116, 34648, 34859, 35977, 35987, 36035, 36456, 37028, 37238, 37640, 38107, 38735, 39015, 40984, 41115, 41567, 42397, 43219, 43783, 44599, 44980, 45239, 47676, 48922, 49534, 50282, 50683, 50804, 50919, 51076, 51211, 52000, 52183, 52489, 52595, 53884, 54726, 55795, 56992].

The translation quality has not been verified. Use at your own risk!

Licensing Information

Licensing info for Stack Overflow Questions is listed as Apache 2.0. If you use the current dataset, you should also adhere to the original license.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub with the same DOI and license. See that README for more info.
h
Bitext-customer-support-llm-chatbot-training-dataset
huggingface.co
opendatalab.com
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Scam Conversation Corpus
zenodo.org
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2025). Scam Conversation Corpus [Dataset]. http://doi.org/10.5281/zenodo.15212527
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15212527
Dataset updated
Apr 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
Description
Dataset Description

Overview

This dataset contains JSON files representing conversations with scammers, facilitated by GPT-4o, alongside a structured media folder for multimedia content. Each conversation is uniquely identified and may span multiple communication platforms.

Data Structure

JSON Files

Each JSON file represents a single conversation and contains:

unique_id: Unique identifier for the conversation.

scam_ids: Dictionary mapping communication mediums to scammer identifiers (anonymized email addresses, usernames, etc.).

victim_ids: Dictionary mapping communication mediums to victim identifiers (anonymized email addresses, usernames, etc.).

pause_start, pause_end: Hours marking the local time in 0-24h format on the collection servers (where the response application ran), indicating periods where the application intentionally did not respond to simulate human sleep patterns.

victim_name: The name of the victim (for context in simulations).

messages: List of message objects with:

is_inbound: Boolean, true if the message is from the scammer to the victim.

from: Sender's identifier.

to: Receiver's identifier.

subject: (Email only) Subject line.

time: Timestamp (UNIX format).

medium: Communication platform (Email, Telegram, or Instagram).

multimedia: Boolean, true if the message contains multimedia.

body: Text content of the message.

Note: Only structured fields like scammer/victim identifiers are anonymized — message content remains unaltered.

Multimedia Descriptions:

If a message contains multimedia content, descriptions of the received files are appended to the body field. The appended text follows this format:
This message contains files. If the description for a file does not make sense,
ignore it.
Here are descriptions of those files:
Description for file 1: example_description_1
Description for file 2: example_description_2

The description also can contain information about the file not being interpreted or not being able to be opened.

Media Folder Structure

Multimedia content is organized by platform and conversation ID. A conversation has a corresponding media folder only if it contains multimedia.

media
|-- email
| |--
| |--
|-- ig
| |--
| |--
|-- tg
|--
|--

Email and Telegram: Files are named message_timestamp_documentname.

Instagram: Files are named only by message_timestamp, as Instagram doesn’t support multi-file messages.

Each medium folder contains subfolders named after unique_conversation_id.

Cross-Medium Conversations

Conversations may span multiple mediums (e.g., starting via Email and continuing on Telegram). Each message tracks its originating medium.

Data Quality Notes

Fragmented Conversations: Some messages may belong to the same logical conversation but are split into separate conversations. This happens when a scammer contacts the same victim ID (e.g., email address) from a different, previously unknown email address without prior notification.

Medium Switch Anomalies: Due to imperfect prompt handling, medium switches may appear unnatural. For example, the LLM might state it will switch to Telegram shortly, while the message stating this is already on Telegram.

Synthetic Personal Data: All personal data provided by the victim is fabricated — generated either by the authors or the LLM. However, coincidences may lead to data resembling real-world entities. Handle with caution.

Incomplete Messenger Representations: Some interactions, like calls on Instagram or Telegram, cannot be fully represented. These will appear as messages with "body": "null".

Usage Restrictions

This dataset is strictly for research purposes only.
r
Scenario ontology generation with OpenAI GPT-4
researchdata.se
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karl Kindström Andersson (2024). Scenario ontology generation with OpenAI GPT-4 [Dataset]. http://doi.org/10.5878/7bzb-mz52
Explore at:
(516548), (676200), (2279941)Available download formats
Unique identifier
https://doi.org/10.5878/7bzb-mz52
Dataset updated
Aug 6, 2024
Dataset provided by
Linköping University
Authors
Karl Kindström Andersson
Description
This dataset is separated into three parts. The text and images in all documents have been manually copied from the OpenAI webpage, and the formating has been recreated to the greatest extent.

DATASET1: Conversation history from Scenario Ontology Generation with OpenAI GPT-4 This dataset contains the conversation history where OpenAI GPT-4 is asked to analyze a text, is asked to create an ontology and then visualize it in a graph. The dataset contains all user input and the OpenAI GPT-4 output, including the graphical representations of the ontology. The dataset also contains a JSON file that was generated that describes the ontology.

The document contains a conversation log with OpenAI GPT-4 where a text is analyzed by the AI and an ontology graph and a JSON file describing the ontology is created.

DATASET2: Conversation history from Scenario Generation with OpenAI GPT-4 This dataset contains the conversation history where OpenAI GPT-4 is asked to generate a hypothetical scenario for the development of a new hypothetical multi-role tactical airlift. The dataset contains all user input and the OpenAI GPT-4 output, including the graphical representations of the ontology.

The document contains conversation log with OpenAI GPT-4 where a hypothetical scenario is generated.

DATASET3: CHAMP Operational Scenario generated from OpenAI GPT-4 OpenAI GPT-4 was used to generate a hypothetical scenario for the development of a hypothetical new multi-role airlift.

The document contains a hypothetical scenario that has been generated by OpenAI GPT-4

The dataset was originally published in DiVA and moved to SND in 2024.
r
Conversation history from Scenario Generation with OpenAI GPT-4
researchdata.se
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linköping University (2024). Conversation history from Scenario Generation with OpenAI GPT-4 [Dataset]. http://doi.org/10.48360/yykt-r359
Explore at:
Unique identifier
https://doi.org/10.48360/yykt-r359
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Linköping University
Description
This dataset contains the conversation history where OpenAI GPT-4 is asked to generate a hypothetical scenario for the development of a new hypothetical multi-role tactical airlift. The dataset contains all user input and the OpenAI GPT-4 output, including the graphical representations of the ontology. The generated word-file, containing the generated scenario, is stored in a separate dataset: "CHAMP Operational Scenario generated from OpenAI GPT-4"

Document containing conversation log with OpenAI GPT-4 where a hypothetical scenario is generated. The text and images have been manually copied from the OpenAI webpage, and the formating has been recreated to the greatest extent

License: Creative Commons Attribution 4.0 International

The dataset was originally published in DiVA and moved to SND in 2024.
Quora Chat Dutch
zenodo.org
huggingface.co
bin
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bram Vanroy; Bram Vanroy (2023). Quora Chat Dutch [Dataset]. http://doi.org/10.57967/hf/0524
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.57967/hf/0524
Dataset updated
Jun 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bram Vanroy; Bram Vanroy
License
https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Description
This dataset contains 54,444 conversations between een AI assistant and a (fake) "Human" (generated) in Dutch. They are translations of Baize's machine-generated answers to the Quora dataset.

Data Instances

{ 'id': 10, 'input': 'Het gesprek tussen de mens en de AI-assistent. ' '[|Human|] Ik vroeg me af wanneer de Moto G3 de Android 7.0-update ' 'zal krijgen? ' '[|AI|] De Moto G3 (Moto G 3rd gen 2015) komt in aanmerking voor de ' 'Android 7.0 (Nougat) update. ' '[|Human|] Geweldig! Weet je wanneer de update beschikbaar zal ' 'zijn? ' '[|AI|] De update werd uitgebracht in december 2016 voor de Moto G4 ' 'en G4 Plus. Er is echter nog geen woord over wanneer de Moto G3 de ' 'update zal ontvangen. ' '[|Human|] Is er nieuws of het binnenkort de update zal ontvangen? ' '[|AI|] Het spijt me, ik heb momenteel geen informatie daarover. ' 'Het is het beste om uit te kijken naar updates van Motorola. ' '[|Human|] ', 'topic': 'Wanneer krijgt Moto G3 (Moto G 3e gen 2015) de Android 7.0 ' '(Nougat) update?' }

Data Fields

id: the ID of the item. The following IDs are not included because they could not be translated: [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740]

input: the machine-generated conversation between AI and "Human". Always starts with Het gesprek tussen de mens en de AI-assistent. and has at least one occurrence of both [|AI|] and [|Human|].

topic: the topic description

Dataset Creation

Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

The prompt template to translate the input is:

CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a conversation between an AI assistant and a human from {src_lang} into {tgt_lang}. Here are the requirements that you should adhere to: 1. maintain the format: the conversation consists of the AI (marked as `[|AI|]`) and the human (`[|Human|]`) talking in turns and responding to each other; 2. do not translate the speaker identifiers `[|AI|]` and `[|Human|]` but always copy them into the translation in appropriate places; 3. ensure accurate translation and keep the correctness of the conversation; 4. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias; 5. translate the human's text using informal, but standard, language; 6. make sure to avoid biases (such as gender bias, grammatical bias, social bias); 7. if the human asks to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in {tgt_lang}, and then also generate a corrected output version for the AI in {tgt_lang}; 8. if the human asks to translate text from one to another language, then you only translate the human's question to {tgt_lang} but you keep the translation that the AI provides in the language that the human requested; 9. do not translate code fragments but copy them as they are. If there are English examples, variable names or definitions in code fragments, keep them in English. Now translate the following conversation with the requirements set out above. Do not provide an explanation and do not add anything else. """

The prompt to translate the topic is:

TOPIC_TRANSLATION_PROMPT = "Translate the following title of a conversation from {src_lang} to {tgt_lang} in a succinct," \ " summarizing manner. Translate accurately and formally. Do not provide any explanation" \ " about the translation and do not include the original title. "

The system message was:

You are a helpful assistant that translates English to Dutch to the requirements that are given to you.

Note that 12 items (0.02%) were not successfully translated. The translation was missing the AI identifier [|AI|] and/or the human one [|Human|]. The IDs for the missing items are [1352, 14599, 25985, 27167, 28254, 33731, 41550, 44364, 47293, 49060, 51383, 53740].

The translation quality has not been verified. Use at your own risk!

Licensing Information

Licensing info for quora is listed as unknown. Baize is listed as GPLv3. So this dataset is also listed as such. A specific restriction is in place: as per OpenAI's terms of use, this dataset cannot be used to build a commercial system that competes with OpenAI's services.

This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

If you use this dataset, you must also follow the Sharing and Usage policies.

As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

This dataset is also available on the Hugging Face hub which is the canonical repository for this dataset. See the link for more info.
h
toxic-chat
huggingface.co
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Large Model Systems Organization (2024). toxic-chat [Dataset]. https://huggingface.co/datasets/lmsys/toxic-chat
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 25, 2024
Dataset authored and provided by
Large Model Systems Organization
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Update

[01/31/2024] We update the OpenAI Moderation API results for ToxicChat (0124) based on their updated moderation model on on Jan 25, 2024.[01/28/2024] We release an official T5-Large model trained on ToxicChat (toxicchat0124). Go and check it for you baseline comparision![01/19/2024] We have a new version of ToxicChat (toxicchat0124)!

Content

This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna online demo. We utilize a human-AI… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/toxic-chat.
h
sales-conversations
huggingface.co
Updated Sep 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ENGEL (2023). sales-conversations [Dataset]. https://huggingface.co/datasets/goendalf666/sales-conversations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 28, 2023
Authors
ENGEL
Description
Dataset Card for "sales-conversations"

This dataset was created for the purpose of training a sales agent chatbot that can convince people. The initial idea came from: textbooks is all you need https://arxiv.org/abs/2306.11644 gpt-3.5-turbo was used for the generation

Structure

The conversations have a customer and a salesman which appear always in changing order. customer, salesman, customer, salesman, etc. The customer always starts the conversation Who ends the… See the full description on the dataset page: https://huggingface.co/datasets/goendalf666/sales-conversations.
h
ShareGPT-Unfiltered-RedPajama-Chat-format
huggingface.co
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fredi (2023). ShareGPT-Unfiltered-RedPajama-Chat-format [Dataset]. https://huggingface.co/datasets/Fredithefish/ShareGPT-Unfiltered-RedPajama-Chat-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2023
Authors
Fredi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
ShareGPT unfiltered dataset in RedPajama-Chat format

This dataset was created by converting The alpaca-lora formatted ShareGPT dataset to the format required by RedPajama-Chat. This script was used for the conversion: https://github.com/fredi-python/Alpaca2INCITE-Dataset-Converter/blob/main/convert.py WARNING: Only the first human and gpt text of each conversation from the original dataset is included in the dataset.

The format

{"text": "
h
ChatGPT-Jailbreak-Prompts
huggingface.co
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubén Darío Jaramillo Romero (2023). ChatGPT-Jailbreak-Prompts [Dataset]. https://huggingface.co/datasets/rubend18/ChatGPT-Jailbreak-Prompts
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 19, 2023
Authors
Rubén Darío Jaramillo Romero
Description
Dataset Card for Dataset Name

Name

ChatGPT Jailbreak Prompts

Dataset Summary

ChatGPT Jailbreak Prompts is a complete collection of jailbreak related prompts for ChatGPT. This dataset is intended to provide a valuable resource for understanding and generating text in the context of jailbreaking in ChatGPT.

Languages

[English]
h
math
huggingface.co
Updated Apr 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CAMEL-AI.org (2023). math [Dataset]. https://huggingface.co/datasets/camel-ai/math
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2023
Dataset provided by
CAMEL-AI.org
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society

Github: https://github.com/lightaime/camel Website: https://www.camel-ai.org/ Arxiv Paper: https://arxiv.org/abs/2303.17760

Dataset Summary

Math dataset is composed of 50K problem-solution pairs obtained using GPT-4. The dataset problem-solutions pairs generating from 25 math topics, 25 subtopics for each topic and 80 problems for each "topic,subtopic" pairs. We provide the data… See the full description on the dataset page: https://huggingface.co/datasets/camel-ai/math.
WildChat-nontoxic
huggingface.co
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WildChat-nontoxic [Dataset]. https://huggingface.co/datasets/allenai/WildChat-nontoxic
Explore at:
Dataset updated
Nov 14, 2023
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
Description
Dataset Card for WildChat-nontoxic

Note: a newer version with 1 million conversations and demographic information can be found here. Dataset Description

Paper: https://wenting-zhao.github.io/papers/wildchat.pdf

License: https://allenai.org/licenses/impact-lr

Language(s) (NLP): multi-lingual

Point of Contact: Yuntian Deng

Dataset Summary

WildChat-nontoxic is the nontoxic subset of the WildChat dataset, a collection of 530K conversations… See the full description on the dataset page: https://huggingface.co/datasets/allenai/WildChat-nontoxic.
Not seeing a result you expected?
Learn how you can add new datasets to our index.