100+ datasets found

h
Bitext-travel-llm-chatbot-training-dataset
huggingface.co
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2025
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.
AI medical chatbot
kaggle.com
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yousef Saeedian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description:

This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

Key Features:

Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.

Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.

Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.

Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

Potential Use Cases:

NLP Model Training: Train models to understand and generate medical dialogues.

Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.

Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.

Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.
h
Bitext-retail-ecommerce-llm-chatbot-training-dataset
huggingface.co
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.
Mental Health Conversational Data
kaggle.com
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
elvis (2022). Mental Health Conversational Data [Dataset]. https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
elvis
Description
A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.

This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.

The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.
h
Bitext-events-ticketing-llm-chatbot-training-dataset
huggingface.co
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-events-ticketing-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-events-ticketing-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Events and Ticketing Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [events and ticketing] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-events-ticketing-llm-chatbot-training-dataset.
FAQ Datasets for Chatbot Training
kaggle.com
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Srivastava (2020). FAQ Datasets for Chatbot Training [Dataset]. https://www.kaggle.com/datasets/abbbhishekkk/faq-datasets-for-chatbot-training/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abhishek Srivastava
Description
Dataset

This dataset was created by Abhishek Srivastava

Contents
m
Chat Bot Dataset for AI/ML models
data.macgence.com
mp3
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). Chat Bot Dataset for AI/ML models [Dataset]. https://data.macgence.com/dataset/chat-bot-dataset-for-aiml-models
Explore at:
mp3Available download formats
Dataset updated
Aug 4, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
Get a high-quality chat bot dataset for AI/ML models. Enhance NLP training with diverse conversational data for accurate, efficient machine learning applications.
m
dataset
data.mendeley.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vignesh A (2023). dataset [Dataset]. http://doi.org/10.17632/cpp3bx8ghd.1
Explore at:
Unique identifier
https://doi.org/10.17632/cpp3bx8ghd.1
Dataset updated
Oct 4, 2023
Authors
Vignesh A
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains SQUAD and NarrativeQA dataset files
e
Training data for City of Helsinki chatbots
data.europa.eu
unknown
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helsingin kaupunginkanslia (2024). Training data for City of Helsinki chatbots [Dataset]. https://data.europa.eu/data/datasets/df89ebc7-930c-439f-b073-da91dfa81d6d?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Feb 20, 2024
Dataset authored and provided by
Helsingin kaupunginkanslia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki
Description
City of Helsinki chatbot training data. Data currently includes maternity and child care services’ chatbot NeRo, International House Helsinki chatbot Into, rental apartment search chatbot and outdoor bot Urho training data.

The service responds based on the trained rule-based discussion paths and the question-answer pairs determined by city experts. Knowledge bases consist of several different areas, from which open data is published on the topics of questions (intents), variable/synonymous libraries (entities) and answers (answers) related to the discussion.

The published data consists only of the above mentioned knowledge base areas, no customer discussions will be included for privacy reasons.

NeRo

Maternity and child care services’ chatbot NeRo answered questions about the growth or development of a child and problems related to pregnancy at the Helsinki maternity clinics. In addition to this, customers were able to also ask about topics related to dental care, speech development and nutrition. Today NeRo operates as part of Hester, a chatbot for social services, health care and rescue services division, and continues to serve the maternity and child health services’ customers in an even more versatile continent. The NeRo training data is no longer updated.

Into

International House Helsinki chatbot Into is a 24-hour customer service channel that provides a wide range of information on the official services offered by IHH and advice to support the settling of people who have moved to the Helsinki metropolitan area from abroad. With the help of the service, customers have faster access to International House Helsinki’s wide range of services for the city and the authorities. The service is provided in English and it is intended for all people who have recently moved to the capital region and for international people who are considering moving to the capital region.

The rental apartment search

The rental apartment search chatbot is a 24-hour customer service channel of the City of Helsinki housing services aimed at improving the accessibility of customer service and the customer experience as well as increasing the interactivity of the self-service. The service provides relevant information to each customer’s specific questions faster than by searching for the information on the website.

Urho

The outdoor bot Urho is a chatbot that provides assistance on outdoor and physical activity topics, serving citizens around the clock and, if necessary, directing the conversation to the Helsinki Info service advisors. The service improves the accessibility of customer service, the customer experience and the interactivity of self-service, as well as speeding up the process of finding relevant information for each customer compared to searching for information on a website.

The chatbot has being used on various city outdoor and sports websites, but at the moment it is not on any of the websites. The bot can be used to ask questions about outdoor and sports facilities and services, for example. The service works on a rule-based basis, based on question-answer pairs and discussion dialogues defined by advice and subject-matter experts. The service increases efficiency by allowing the automation of frequently asked questions.

The parking chatbot

The parking chatbot is a customer service channel of city’s parking services. The service provides automated answers to the parking-related questions of city residents and visitors. The service is available at the city parking website of Helsinki.

Attributes

XLSX file, the different categories can be found on the different worksheet tabs.

Intents

XLSX file format: the first column contains example question, the second column ID for intent. That is, first the question method in which a particular thing can be expected to be asked, and then the Intent ID by which the system connects the question to the intent and performs a defined action for it.

Entities

XLSX file format: the first column contains entity ID, the following columns alternative forms for entity.

In the first column, the thing to which you want to be given a synonym or other thing that needs to be associated with that entity. Occasionally, bending forms are also added if the AI does not recognize them reliably enough without. In the following columns, synonyms/other words associated with the same thing. Note! The system from which exports are taken splits the same entity in exports over several lines for unknown reason.

Answers

key = an identifying name (ID) unique to that response in the system. This is referred to in dialogue definitions when assigning a response to a specific intent in a given situation

value = the actual response text given to the client in the user interface. Occasionally includes so-called tags that provide clickable hyperlinks, selection buttons, livechat migration, or other functional elements. Texts separated by verti
Mental Health Chatbot Pairs
kaggle.com
Updated Nov 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Mental Health Chatbot Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-chatbot-pairs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 27, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

By Huggingface Hub [source]

About this dataset

This dataset contains a compilation of carefully-crafted Q&A pairs which are designed to provide AI-based tailored support for mental health. These carefully chosen questions and answers offer an avenue for those looking for help to gain the assistance they need. With these pre-processed conversations, Artificial Intelligence (AI) solutions can be developed and deployed to better understand and respond appropriately to individual needs based on their input. This comprehensive dataset is crafted by experts in the mental health field, providing insightful content that will further research in this growing area. These data points will be invaluable for developing the next generation of personalized AI-based mental health chatbots capable of truly understanding what people need

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains pre-processed Q&A pairs for AI-based tailored support for mental health. As such, it represents an excellent starting point in building a conversational model which can handle conversations about mental health issues. Here are some tips on how to use this dataset to its fullest potential:

Understand your data: Spend time getting to know the text of the conversation between the user and the chatbot and familiarize yourself with what type of questions and answers are included in this specific dataset. This will help you better formulate queries for your own conversational model or develop new ones you can add yourself.

Refine your language processing models: By studying the patterns in syntax, grammar, tone, voice, etc., within this conversational data set you can hone your natural language processing capabilities - such as keyword extractions or entity extraction – prior to implementing them into a larger bot system .

Test assumptions: Have an idea of what you think may work best with a particular audience or context? See if these assumptions pan out by applying different variations of text to this dataset to see if it works before rolling out changes across other channels or programs that utilize AI/chatbot services

Research & Analyze Results : After testing out different scenarios on real-world users by using various forms of q&a within this chatbot pair data set , analyze & record any relevant results pertaining towards understanding user behavior better through further analysis after being exposed to tailored texted conversations about Mental Health topics both passively & actively . The more information you collect here , leads us closer towards creating effective AI powered conversations that bring our desired outcomes from our customer base .

Research Ideas

Developing a chatbot for personalized mental health advice and guidance tailored to individuals' unique needs, experiences, and struggles.

Creating an AI-driven diagnostic system that can interpret mental health conversations and provide targeted recommendations for interventions or treatments based on clinical expertise.

Designing an AI-powered recommendation engine to suggest relevant content such as articles, videos, or podcasts based on users’ questions or topics of discussion during their conversation with the chatbot

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------------------------| | text | The text of the conversation between the user and the chatbot. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
g
ChatBot Dataset for Transformers
gts.ai
json
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2025). ChatBot Dataset for Transformers [Dataset]. https://gts.ai/dataset-download/chatbot-dataset-for-transformers/
Explore at:
jsonAvailable download formats
Dataset updated
Jan 9, 2025
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
Description
Train conversational AI with the ChatBot Dataset for Transformers. Featuring human-like dialogues, preprocessed inputs, and labels, it’s perfect for GPT, BERT, T5, and NLP projects
f
Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li (2023). Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight Management: System Design and Finding.pdf [Dataset]. http://doi.org/10.3389/fnut.2022.870775.s004
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnut.2022.870775.s004
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As the obesity rate continues to increase persistently, there is an urgent need to develop an effective weight loss management strategy. Nowadays, the development of artificial intelligence (AI) and cognitive technologies coupled with the rapid spread of messaging platforms and mobile technology with easier access to internet technology offers professional dietitians an opportunity to provide extensive monitoring support to their clients through a chatbot with artificial empathy. This study aimed to design a chatbot with artificial empathic motivational support for weight loss called “SlimMe” and investigate how people react to a diet bot. The SlimMe infrastructure was built using Dialogflow as the natural language processing (NLP) platform and LINE mobile messenger as the messaging platform. We proposed a text-based emotion analysis to simulate artificial empathy responses to recognize the user's emotion. A preliminary evaluation was performed to investigate the early-stage user experience after a 7-day simulation trial. The result revealed that having an artificially empathic diet bot for weight loss management is a fun and exciting experience. The use of emoticons, stickers, and GIF images makes the chatbot response more interactive. Moreover, the motivational support and persuasive messaging features enable the bot to express more empathic and engaging responses to the user. In total, there were 1,007 bot responses from 892 user input messages. Of these, 67.38% (601/1,007) of the chatbot-generated responses were accurate to a relevant user request, 21.19% (189/1,007) inaccurate responses to a relevant request, and 10.31% (92/1,007) accurate responses to an irrelevant request. Only 1.12% (10/1,007) of the chatbot does not answer. We present the design of an artificially empathic diet bot as a friendly assistant to help users estimate their calorie intake and calories burned in a more interactive and engaging way. To our knowledge, this is the first chatbot designed with artificial empathy features, and it looks very promising in promoting long-term weight management. More user interactions and further data training and validation enhancement will improve the bot's in-built knowledge base and emotional intelligence base.
o
Mental Health Dialogue Training Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Mental Health Dialogue Training Dataset [Dataset]. https://www.opendatabay.com/data/healthcare/8ec5252f-d432-4d05-b55b-25ab4a45b61d
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Mental Health & Wellness
Description
This dataset provides real-life conversations focused on mental health concerns, ideal for developing accurate and informative models to assist individuals seeking support for their mental well-being. It includes statements or questions forming the conversation context and expert responses from mental health counselors. The dataset serves as a valuable resource for generating insights and guidance across various aspects of mental health, facilitating the creation of AI-based tools and enhancing professional counselling techniques.

Columns

The dataset is provided as a CSV file, train.csv, and features two key columns: * Context: This column contains the initial statements or questions that establish the overall context of the conversation, specifically addressing mental health issues. * Response: This column holds the corresponding replies delivered by a trained mental health counsellor, designed to address and support individuals within the given context.

Distribution

The dataset is supplied in a CSV file format named train.csv. It is structured with two primary columns, "Context" and "Response". Specific numbers for rows or records are not detailed in the provided information, but the "Context" column contains 2480 unique values.

Usage

This dataset is well-suited for a variety of applications: * Chatbot Development: Utilise it as a training resource for building AI-based mental health chatbots capable of generating relevant responses. * Sentiment Analysis: Apply sentiment analysis techniques to individually or comparatively analyse both the context and response columns. * Topic Modelling: Extract hidden topics within conversations using Natural Language Processing (NLP) methods such as Latent Dirichlet Allocation (LDA) or Non-Negative Matrix Factorization (NMF). * Machine Learning Applications: Classify conversations into different mental health concern categories or train models to generate appropriate responses based on given contexts using approaches like sequence-to-sequence models or transformers. * Research: Analyse to gain insights into common questions, concerns, and themes related to mental health, aiding the understanding of individuals' needs. * Improving Counselling Techniques: Mental health professionals can study successful counselling responses to enhance their skills or develop training programmes.

Coverage

The dataset is of a global region. It does not include specific dates or timeframes associated with the conversations, which helps ensure privacy and confidentiality for both the individuals and counsellors involved. It contains sensitive information related to mental health, so ethical considerations, including anonymisation, are vital when using this data for research or practical applications.

License

CCO

Who Can Use It

Professionals in the mental health field.

Researchers studying mental health conversations and interventions.

Developers creating AI-based mental health chatbots and virtual assistants.

Mental health professionals looking to enhance their counselling skills or develop training programmes.

Dataset Name Suggestions

Amod Mental Health Counselling Conversations

Mental Health Dialogue Training Data

Counselling Conversation Dataset

Mental Well-being Support Conversations

AI Mental Health Chatbot Training Data

Attributes

Original Data Source: Amod Mental Health Counseling Conversations
m
Chat Bot Image Dataset
data.macgence.com
mp3
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). Chat Bot Image Dataset [Dataset]. https://data.macgence.com/dataset/chat-bot-image-dataset
Explore at:
mp3Available download formats
Dataset updated
Jun 16, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
Access our chatbot image dataset designed for AI training. Ideal for boosting visual recognition, enhancing chatbot interfaces, and optimizing user experience.
4
A feedback system for a children’s helpline training-chatbot - Data from a...
data.4tu.nl
zip
Updated Dec 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayrton Braam (2023). A feedback system for a children’s helpline training-chatbot - Data from a Survey [Dataset]. http://doi.org/10.4121/9c68a82e-ad6c-420b-88dd-2e86ec729ffb.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/9c68a82e-ad6c-420b-88dd-2e86ec729ffb.v1
Dataset updated
Dec 11, 2023
Dataset provided by
4TU.ResearchData
Authors
Ayrton Braam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The project is a within-subjects study design, with between subjects exploratory measures in order to compare an immediate feedback system to an explanation sheet. The conditions are tested on a simulation of a virtual child, in order to help them navigate a conversational model.
F
General domain Human-Human conversation chats in Bahasa
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). General domain Human-Human conversation chats in Bahasa [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/bahasa-general-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
This training dataset comprises more than 10,000 conversational text data between two native Bahasa people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
Evaluation Dataset for Chatbot/Virtual Assistants
kaggle.com
Updated Mar 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2022). Evaluation Dataset for Chatbot/Virtual Assistants [Dataset]. https://www.kaggle.com/datasets/bitext/evaluation-dataset-chatbot-virtual-assistants/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bitext
Description
Bitext Sample Pre-built Customer Service Evaluation Dataset for English

Overview

This Evaluation dataset contains example utterances taken from the "change order" intent from Bitext's pre-built Customer Service domain (which itself covers common intents present across Bitext's 20 pre-built domains). The data can be used to evaluate intent recognition models Natural Language Understanding (NLU) platforms.

Utterances

The dataset contains 10,000 utterances, extracted from a larger dataset of over 1,000,000 utterances, including language register variations such as politeness, colloquial, swearing, indirect style... To select the utterances, we use stratified sampling to generate a dataset with a general user language register profile.

The dataset also reflects commonly occurring linguistic phenomena of real-life chatbots, such as: - spelling mistakes - run-on words - missing punctuation

Contents

Each entry in the dataset contains an example utterance along with its corresponding intent, category and additional linguistic information. Each line contains the following four fields: - flags: the applicable linguistic flags - utterance: an example user utterance - category: the high-level intent category - intent: the intent corresponding to the user utterance

Linguistic flags

The dataset contains annotations for linguistic phenomena, which can be used to adapt bot training to different user language profiles. These flags are: B - Basic syntactic structure L - Lexical variation (synonyms) M - Morphological variation (plurals, tenses…) C - Complex/Coordinated syntactic structure E - Expanded abbreviations (I'm -> I am, I'd -> I would…) I - Interrogative structure K - Keyword only P - Politeness variation Q - Colloquial variation W - Offensive language Z - Noise (spelling, punctuation…)

These phenomena make the training dataset more effective and make bots more accurate and robust.

Categories and Intents

The intent categories covered by the dataset are: ORDER

The intents covered by the dataset are: change_order

(c) Bitext Innovations, 2022
Z
French trainset for chatbots dealing with usual requests on bank cards
data.niaid.nih.gov
zenodo.org
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schild, Erwan (2023). French trainset for chatbots dealing with usual requests on bank cards [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4769949
Explore at:
Dataset updated
Nov 14, 2023
Dataset authored and provided by
Schild, Erwan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
French
Description
[EN] French training dataset for chatbots dealing with usual requests on bank cards.

Description: This dataset represents examples of common customer requests relating to bank cards management. It can be used as a training set for a small chatbot intended to process these usual requests.

Content: The questions are asked in French. The dataset is divided into 10 intents of 100 questions each, for a total of 1 000 questions.

Intents scope: Intents are constructed in such a way that all questions arising from the same intention have the same response or action. The scope covered concerns: loss or theft of cards; the swallowed card; the card order; consultation of the bank balance; insurance provided by a card; card unlocking; virtual card management; management of bank overdraft; management of payment limits; management of contactless mode.

Origin: Intents scope is inspired by a chatbot currently in production, and the wording of the questions are inspired by the usual customers requests.

[FR] Jeu d'entraînement en français d'assistants conversationnels traitant des demandes courantes sur les cartes bancaires.

Description : Cet ensemble de données représente des exemples de demandes usuelles des clients concernant la gestion des cartes bancaires. Il peut être utilisé comme jeu d'entraînement pour un assistant conversationnel destiné à traiter ces demandes courantes.

Contenu : Les questions sont formulées en français. L'ensemble de données est divisé en 10 intentions de 100 questions chacune, pour un total de 1 000 questions.

Périmètre des intentions : Les intentions sont construites de telle manière que toutes les questions issues d'une même intention ont la même réponse ou action. Le périmètre couvert concerne : la perte ou le vol de cartes ; la carte avalée ; la commande des cartes ; la consultation du solde bancaire ; l'assurance fournie par une carte ; le déverrouillage de la carte ; la gestion de cartes virtuelles ; la gestion du découvert bancaire ; la gestion des plafonds de paiement ; la gestion du mode sans contact.

Origine : Le périmètre des intentions est inspiré par un chatbot actuellement en production, et la formulation des questions est inspirée de demandes courantes de clients.
Chatbot Store Inventory
kaggle.com
Updated Feb 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steve Levesque (2022). Chatbot Store Inventory [Dataset]. https://www.kaggle.com/datasets/stevelevesque/chatbotstoreinventory/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Steve Levesque
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Used for

In a toy project chatbot: - https://github.com/steve-levesque/Portfolio-NLP-ChatbotStoreInventory

Acknowledgements

Based on the structure in this article: - https://chatbotsmagazine.com/contextual-chat-bots-with-tensorflow-4391749d0077
o
AI Question Answering Data
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). AI Question Answering Data [Dataset]. https://www.opendatabay.com/data/ai-ml/d3c37fed-f830-444b-a988-c893d3396fd7
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset provides essential information for entries related to question answering tasks using AI models. It is designed to offer valuable insights for researchers and practitioners, enabling them to effectively train and rigorously evaluate their machine learning models. The dataset serves as a valuable resource for building and assessing question-answering systems. It is available free of charge.

Columns

instruction: Contains the specific instructions given to a model to generate a response.

responses: Includes the responses generated by the model based on the given instructions.

next_response: Provides the subsequent response from the model, following a previous response, which facilitates a conversational interaction.

answer: Lists the correct answer for each question presented in the instruction, acting as a reference for assessing the model's accuracy.

is_human_response: A boolean column that indicates whether a particular response was created by a human or by a machine learning model, helping to differentiate between the two. Out of nearly 19,300 entries, 254 are human-generated responses, while 18,974 were generated by models.

Distribution

The data files are typically in CSV format, with a dedicated train.csv file for training data and a test.csv file for testing purposes. The training file contains a large number of examples. Specific dates are not included within this dataset description, focusing solely on providing accurate and informative details about its content and purpose. Specific numbers for rows or records are not detailed in the available information.

Usage

This dataset is ideal for a variety of applications and use cases: * Training and Testing: Utilise train.csv to train question-answering models or algorithms, and test.csv to evaluate their performance on unseen questions. * Machine Learning Model Creation: Develop machine learning models specifically for question-answering by leveraging the instructional components, including instructions, responses, next responses, and human-generated answers, along with their is_human_response labels. * Model Performance Evaluation: Assess model performance by comparing predicted responses with actual human-generated answers from the test.csv file. * Data Augmentation: Expand existing data by paraphrasing instructions or generating alternative responses within similar contexts. * Conversational Agents: Build conversational agents or chatbots by utilising the instruction-response pairs for training. * Language Understanding: Train models to understand language and generate responses based on instructions and previous responses. * Educational Materials: Develop interactive quizzes or study guides, with models providing instant feedback to students. * Information Retrieval Systems: Create systems that help users find specific answers from large datasets. * Customer Support: Train customer support chatbots to provide quick and accurate responses to inquiries. * Language Generation Research: Develop novel algorithms for generating coherent responses in question-answering scenarios. * Automatic Summarisation Systems: Train systems to generate concise summaries by understanding main content through question answering. * Dialogue Systems Evaluation: Use the instruction-response pairs as a benchmark for evaluating dialogue system performance. * NLP Algorithm Benchmarking: Establish baselines against which other NLP tools and methods can be measured.

Coverage

The dataset's geographic scope is global. There is no specific time range or demographic scope noted within the available details, as specific dates are not included.

License

CC0

Who Can Use It

This dataset is highly suitable for: * Researchers and Practitioners: To gain insights into question answering tasks using AI models. * Developers: To train models, create chatbots, and build conversational agents. * Students: For developing educational materials and enhancing their learning experience through interactive tools. * Individuals and teams working on Natural Language Processing (NLP) projects. * Those creating information retrieval systems or customer support solutions. * Experts in natural language generation (NLG) and automatic summarisation systems. * Anyone involved in the evaluation of dialogue systems and machine learning model training.

Dataset Name Suggestions

AI Question Answering Data

Conversational AI Training Data

NLP Question-Answering Dataset

Model Evaluation QA Data

Dialogue Response Dataset

Attributes

Original Data Source: Question-Answering Training and Testing Data

Facebook

Twitter

Click to copy link

Link copied

Cite

Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext-travel-llm-chatbot-training-dataset

bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 21, 2025

Dataset authored and provided by

Bitext

License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

  Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.

Clear search

Close search

Google apps

Main menu

Bitext-travel-llm-chatbot-training-dataset

AI medical chatbot

Bitext-retail-ecommerce-llm-chatbot-training-dataset

Mental Health Conversational Data

Bitext-events-ticketing-llm-chatbot-training-dataset

FAQ Datasets for Chatbot Training

Dataset

Contents

Chat Bot Dataset for AI/ML models

dataset

Training data for City of Helsinki chatbots

Attributes

Mental Health Chatbot Pairs

Mental Health Chatbot Pairs

AI-based Tailored Support for Mental Health Conversation

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

ChatBot Dataset for Transformers

Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight...

Mental Health Dialogue Training Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Chat Bot Image Dataset

A feedback system for a children’s helpline training-chatbot - Data from a...

General domain Human-Human conversation chats in Bahasa

What’s Included

Evaluation Dataset for Chatbot/Virtual Assistants

Bitext Sample Pre-built Customer Service Evaluation Dataset for English

Overview

Utterances

Contents

Linguistic flags

Categories and Intents

French trainset for chatbots dealing with usual requests on bank cards

Chatbot Store Inventory

Used for

Acknowledgements

AI Question Answering Data

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Bitext-travel-llm-chatbot-training-datasetSee More Versions

bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

Bitext-travel-llm-chatbot-training-dataset