Facebook
TwitterThis dataset was created by Muhammad Saad Makhdoom
Facebook
TwitterMental health includes our emotional, psychological, and social well-being. Mental health is integral to living a healthy, balanced life. It affects how we think, feel, and act. It also helps determine how we handle stress, relate to others, and make choices. Emotional and mental health is important because it’s a vital part of your life and impacts your thoughts, behaviors and emotions. Being healthy emotionally can promote productivity and effectiveness in activities like work, school or care-giving. It plays an important part in the health of your relationships, and allows you to adapt to changes in your life and cope with adversity. Mental health problems are common but help is available. People with mental health problems can get better and many recover completely.
This dataset consists of FAQs about Mental Health.
https://www.thekimfoundation.org/faqs/
https://www.mhanational.org/frequently-asked-questions
Facebook
TwitterThis dataset was created by CoffeeG
Facebook
TwitterThis dataset was created by Abhishek Srivastava
Facebook
Twitterhttps://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.
The dataset has the following specs:
The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:
For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.
The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.
The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.
Each entry in the dataset contains the following fields:
The categories and intents covered by the dataset are:
The entities covered by the dataset are:
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Chatbot Arena Conversations Dataset
This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The University Chatbot Dataset contains 38 intents covering general university-related inquiries, designed to train, fine-tune, and evaluate conversational AI models in the education sector.
Facebook
TwitterDataset Card for "ecommerce-faq-chatbot-dataset"
More Information needed
Facebook
TwitterThis dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.
Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.
Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "An FAQ Dataset for E-learning System Used on a Japanese University", Data in Brief, Elsevier, in press.
This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.
We attach an English version dataset translated from the Japanese dataset to ease understanding what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.
File contents:
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.
Grants: JSPS KAKENHI Grant Number 18H01057
Facebook
Twitterfarzanrahmani/chatbot-FAQ-queries dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Bitext - Retail Banking Tagged Training Dataset for LLM-based Virtual Assistants
Overview
This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail Banking] sector can be easily achieved using our two-step approach to LLM Fine-Tuning.… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "heliosbrahma/mental_health_chatbot_dataset"
Dataset Description
Dataset Summary
This dataset contains conversational pair of questions and answers in a single text related to Mental Health. Dataset was curated from popular healthcare blogs like WebMD, Mayo Clinic and HeatlhLine, online FAQs etc. All questions and answers have been anonymized to remove any PII data and pre-processed to remove any unwanted characters.
Languages
The… See the full description on the dataset page: https://huggingface.co/datasets/heliosbrahma/mental_health_chatbot_dataset.
Facebook
Twitterbot-remains/student-assistance-chatbot dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset created by the chatbot development Team at Omdena Lagos Nigeria for the project "interactive-chatbot-for-the-omdena-website"
https://omdena.com/chapter-challenges/developing-an-interactive-chatbot-for-the-omdena-website/
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A dataset from an online studies on a simulated social trading platform using a chatbot to give participants advice on investment. 64 participants interacted with a chatbot across 4 conditions: human-like/not human-like, and with reply suggestion buttons/without reply suggestion buttons embedded in the user interface. They were shown 10 different portfolios to follow or unfollow at 5 separate month intervals, basing their decision on the advice of the chatbot or a separate news feed that would try to predict the next change in portfolio value. Participants were assigned an initial virtual balance of £1000. Image tagging was included as a distracting secondary task. All the messages exchanged to and from the chatbot are included, as well as the user actions and image tagging. Participant demographic data included in a separate file.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed, synthetic healthcare chatbot conversations with annotated intent labels, message sequencing, and extracted entities. Designed for training and evaluating conversational AI, it supports intent classification, dialogue modeling, and entity recognition in healthcare virtual assistants. The dataset enables robust analysis of user-bot interactions for improved patient engagement and automation.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Health Question and Answer Clean Dataset
Dataset Details
Dataset Description
This dataset provides a detailed overview of health question & answer pairs. It includes data on health problems and corresponding answers, making it suitable for variable tasks like healthcare chatbot training.
Language(s) (NLP): English License: Apache-2.0
Dataset Sources [optional]
Repository:… See the full description on the dataset page: https://huggingface.co/datasets/shaneperry0101/health-chatbot.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[EN] French training dataset for chatbots dealing with usual requests on bank cards.
[FR] Jeu d'entraînement en français d'assistants conversationnels traitant des demandes courantes sur les cartes bancaires.
Facebook
Twitterhttps://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
High-quality chatbot dataset for AI/ML models in Ecommerce Sector. Train NLP algorithms with diverse conversational data to enhance chatbot accuracy.
Facebook
TwitterThis dataset was created by Muhammad Saad Makhdoom