Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
ChatGPT has taken the world by storm, setting a record for the fastest app to reach a 100 million users, which it hit in two months. The implications of this tool are far-reaching, universities...
In a survey conducted across **** Southeast Asian countries in February 2023, almost half of the respondents selected collection of personal data as one of the concerns they had regarding the usage of chatbots like ChatGPT. In contrast, ethical issues related to data privacy and intellectual property were a concern for ** percent of the respondents.
In a survey conducted across four Southeast Asian countries in February 2023, ** percent of the respondents in Singapore selected the collection of personal data as one of the concerns they had regarding the usage of chatbots like ChatGPT. In contrast, this was an issue for ** percent of respondents in Indonesia.
As of June 2023, it was reported that **** percent of employees of worldwide companies have tried using ChatGPT in the workplace at least once. Those who have put confidential corporate data into the AI-powered tool were *** percent.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Update
[01/31/2024] We update the OpenAI Moderation API results for ToxicChat (0124) based on their updated moderation model on on Jan 25, 2024.[01/28/2024] We release an official T5-Large model trained on ToxicChat (toxicchat0124). Go and check it for you baseline comparision![01/19/2024] We have a new version of ToxicChat (toxicchat0124)!
Content
This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna online demo. We utilize a human-AI… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/toxic-chat.
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Dataset Card for WildChat
Dataset Description
Paper: https://arxiv.org/abs/2405.01470
Interactive Search Tool: https://wildvisualizer.com (paper)
License: ODC-BY
Language(s) (NLP): multi-lingual
Point of Contact: Yuntian Deng
Dataset Summary
WildChat is a collection of 1 million conversations between human users and ChatGPT, alongside demographic data, including state, country, hashed IP addresses, and request headers. We collected WildChat by… See the full description on the dataset page: https://huggingface.co/datasets/allenai/WildChat-1M.
GPTFuzzer is a fascinating project that explores red teaming of large language models (LLMs) using auto-generated jailbreak prompts. Let's dive into the details:
Project Overview: GPTFuzzer aims to assess the security and robustness of LLMs by crafting prompts that can potentially lead to harmful or unintended behavior.
The project focuses on GPT-3 and similar models.
Datasets:
The datasets used in GPTFuzzer include:
Harmful Questions: Sampled from public datasets like llm-jailbreak-study and hh-rlhf. Human-Written Templates: Collected from llm-jailbreak-study. Responses: Gathered by querying models like Vicuna-7B, ChatGPT, and Llama-2-7B-chat.
Models:
The judgment model is a finetuned RoBERTa-large model. The training code and data are available in the repository.
During fuzzing experiments, the model is automatically downloaded and cached.
Updates:
The project has received recognition and awards at conferences like Geekcon 2023. The team continues to improve the codebase and aims to build a general black-box fuzzing framework for LLMs.
Source: Conversation with Bing, 3/17/2024 (1) sherdencooper/GPTFuzz: Official repo for GPTFUZZER - GitHub. https://github.com/sherdencooper/GPTFuzz. (2) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md. (3) GPTFUZZER : Red Teaming Large Language Models with Auto ... - GitHub. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md. (4) undefined. https://avatars.githubusercontent.com/u/37368657?v=4. (5) undefined. https://github.com/sherdencooper/GPTFuzz/blob/master/README.md?raw=true. (6) undefined. https://desktop.github.com. (7) undefined. https://github.com/sherdencooper/GPTFuzz/raw/master/README.md. (8) undefined. https://opensource.org/licenses/MIT. (9) undefined. https://camo.githubusercontent.com/a4426cbe5c21edb002526331c7a8fbfa089e84a550567b02a0d829a98b136ad0/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667. (10) undefined. https://img.shields.io/badge/License-MIT-yellow.svg. (11) undefined. https://arxiv.org/pdf/2309.10253.pdf. (12) undefined. https://sherdencooper.github.io/. (13) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&. (14) undefined. http://www.dataisland.org/. (15) undefined. http://xinyuxing.org/. (16) undefined. https://geekcon.darknavy.com/2023/china/en/index.html. (17) undefined. https://avatars.githubusercontent.com/u/35443979?v=4. (18) undefined. https://github.com/CriticalPulsar/GPTFuzz/blob/master/README.md?raw=true. (19) undefined. https://docs.github.com/articles/about-issue-and-pull-request-templates. (20) undefined. https://github.com/CriticalPulsar/GPTFuzz/raw/master/README.md. (21) undefined. https://scholar.google.com/citations?user=Zv_rC0AAAAAJ&hl=en.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Chat-GPT’s perception and usage patterns among Argentine dermatologists remain unclear. To determine this, we carried out an email survey to members of the Argentine Society of Dermatology in July 2023. While 83.7% of Argentine dermatologists are not acquainted with ChatGPT, a significant 65.4% have never used it. Argentine dermatologists seem to adopt a cautious and intermediate stance towards the tool, likely due to moderate reliability and utility perceptions (Likert scale 3/7). Chat-GPT users are significantly younger (p=0.042), have a higher proportion of “early adopters” (p=0.004), have less technology anxiety (p
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Capybara
This is the Capybara multi-turn chat dataset in ShareGPT format for LLaMA-Factory. Each line represents a conversation between a human user and GPT AI. The first message in the conversation is by a human user. Language is English only.
Example line truncated and pretty-printed: { "conversations": [ { "from": "human", "value": "Using the given plot points, write a short blurb for a science fiction novel. Earth endangered | alien invaders | secret… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/Capybara.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
ChatGPT has taken the world by storm, setting a record for the fastest app to reach a 100 million users, which it hit in two months. The implications of this tool are far-reaching, universities...