100+ datasets found
  1. h

    Bitext-travel-llm-chatbot-training-dataset

    • huggingface.co
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2025
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.

  2. AI medical chatbot

    • kaggle.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yousef Saeedian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description:

    This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

    Key Features:

    • Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.
    • Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.
    • Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.
    • Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

    Potential Use Cases:

    • NLP Model Training: Train models to understand and generate medical dialogues.
    • Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.
    • Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.
    • Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

    This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.

  3. h

    Bitext-retail-ecommerce-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.

  4. Mental Health Conversational Data

    • kaggle.com
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    elvis (2022). Mental Health Conversational Data [Dataset]. https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    elvis
    Description

    A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.

    This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.

    The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.

  5. h

    Bitext-events-ticketing-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-events-ticketing-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-events-ticketing-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Events and Ticketing Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [events and ticketing] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-events-ticketing-llm-chatbot-training-dataset.

  6. FAQ Datasets for Chatbot Training

    • kaggle.com
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Srivastava (2020). FAQ Datasets for Chatbot Training [Dataset]. https://www.kaggle.com/datasets/abbbhishekkk/faq-datasets-for-chatbot-training/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abhishek Srivastava
    Description

    Dataset

    This dataset was created by Abhishek Srivastava

    Contents

  7. m

    Chat Bot Dataset for AI/ML models

    • data.macgence.com
    mp3
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Chat Bot Dataset for AI/ML models [Dataset]. https://data.macgence.com/dataset/chat-bot-dataset-for-aiml-models
    Explore at:
    mp3Available download formats
    Dataset updated
    Aug 4, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Get a high-quality chat bot dataset for AI/ML models. Enhance NLP training with diverse conversational data for accurate, efficient machine learning applications.

  8. m

    dataset

    • data.mendeley.com
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vignesh A (2023). dataset [Dataset]. http://doi.org/10.17632/cpp3bx8ghd.1
    Explore at:
    Dataset updated
    Oct 4, 2023
    Authors
    Vignesh A
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains SQUAD and NarrativeQA dataset files

  9. e

    Training data for City of Helsinki chatbots

    • data.europa.eu
    unknown
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helsingin kaupunginkanslia (2024). Training data for City of Helsinki chatbots [Dataset]. https://data.europa.eu/data/datasets/df89ebc7-930c-439f-b073-da91dfa81d6d?locale=en
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Feb 20, 2024
    Dataset authored and provided by
    Helsingin kaupunginkanslia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Helsinki
    Description

    City of Helsinki chatbot training data. Data currently includes maternity and child care services’ chatbot NeRo, International House Helsinki chatbot Into, rental apartment search chatbot and outdoor bot Urho training data.

    The service responds based on the trained rule-based discussion paths and the question-answer pairs determined by city experts. Knowledge bases consist of several different areas, from which open data is published on the topics of questions (intents), variable/synonymous libraries (entities) and answers (answers) related to the discussion.

    The published data consists only of the above mentioned knowledge base areas, no customer discussions will be included for privacy reasons.

    NeRo

    Maternity and child care services’ chatbot NeRo answered questions about the growth or development of a child and problems related to pregnancy at the Helsinki maternity clinics. In addition to this, customers were able to also ask about topics related to dental care, speech development and nutrition. Today NeRo operates as part of Hester, a chatbot for social services, health care and rescue services division, and continues to serve the maternity and child health services’ customers in an even more versatile continent. The NeRo training data is no longer updated.

    Into

    International House Helsinki chatbot Into is a 24-hour customer service channel that provides a wide range of information on the official services offered by IHH and advice to support the settling of people who have moved to the Helsinki metropolitan area from abroad. With the help of the service, customers have faster access to International House Helsinki’s wide range of services for the city and the authorities. The service is provided in English and it is intended for all people who have recently moved to the capital region and for international people who are considering moving to the capital region.

    The rental apartment search

    The rental apartment search chatbot is a 24-hour customer service channel of the City of Helsinki housing services aimed at improving the accessibility of customer service and the customer experience as well as increasing the interactivity of the self-service. The service provides relevant information to each customer’s specific questions faster than by searching for the information on the website.

    Urho

    The outdoor bot Urho is a chatbot that provides assistance on outdoor and physical activity topics, serving citizens around the clock and, if necessary, directing the conversation to the Helsinki Info service advisors. The service improves the accessibility of customer service, the customer experience and the interactivity of self-service, as well as speeding up the process of finding relevant information for each customer compared to searching for information on a website.

    The chatbot has being used on various city outdoor and sports websites, but at the moment it is not on any of the websites. The bot can be used to ask questions about outdoor and sports facilities and services, for example. The service works on a rule-based basis, based on question-answer pairs and discussion dialogues defined by advice and subject-matter experts. The service increases efficiency by allowing the automation of frequently asked questions.

    The parking chatbot

    The parking chatbot is a customer service channel of city’s parking services. The service provides automated answers to the parking-related questions of city residents and visitors. The service is available at the city parking website of Helsinki.

    Attributes

    XLSX file, the different categories can be found on the different worksheet tabs.

    Intents

    XLSX file format: the first column contains example question, the second column ID for intent. That is, first the question method in which a particular thing can be expected to be asked, and then the Intent ID by which the system connects the question to the intent and performs a defined action for it.

    Entities

    XLSX file format: the first column contains entity ID, the following columns alternative forms for entity.

    In the first column, the thing to which you want to be given a synonym or other thing that needs to be associated with that entity. Occasionally, bending forms are also added if the AI does not recognize them reliably enough without. In the following columns, synonyms/other words associated with the same thing. Note! The system from which exports are taken splits the same entity in exports over several lines for unknown reason.

    Answers

    • key = an identifying name (ID) unique to that response in the system. This is referred to in dialogue definitions when assigning a response to a specific intent in a given situation
    • value = the actual response text given to the client in the user interface. Occasionally includes so-called tags that provide clickable hyperlinks, selection buttons, livechat migration, or other functional elements. Texts separated by verti
  10. Mental Health Chatbot Pairs

    • kaggle.com
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Mental Health Chatbot Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-chatbot-pairs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Mental Health Chatbot Pairs

    AI-based Tailored Support for Mental Health Conversation

    By Huggingface Hub [source]

    About this dataset

    This dataset contains a compilation of carefully-crafted Q&A pairs which are designed to provide AI-based tailored support for mental health. These carefully chosen questions and answers offer an avenue for those looking for help to gain the assistance they need. With these pre-processed conversations, Artificial Intelligence (AI) solutions can be developed and deployed to better understand and respond appropriately to individual needs based on their input. This comprehensive dataset is crafted by experts in the mental health field, providing insightful content that will further research in this growing area. These data points will be invaluable for developing the next generation of personalized AI-based mental health chatbots capable of truly understanding what people need

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains pre-processed Q&A pairs for AI-based tailored support for mental health. As such, it represents an excellent starting point in building a conversational model which can handle conversations about mental health issues. Here are some tips on how to use this dataset to its fullest potential:

    • Understand your data: Spend time getting to know the text of the conversation between the user and the chatbot and familiarize yourself with what type of questions and answers are included in this specific dataset. This will help you better formulate queries for your own conversational model or develop new ones you can add yourself.

    • Refine your language processing models: By studying the patterns in syntax, grammar, tone, voice, etc., within this conversational data set you can hone your natural language processing capabilities - such as keyword extractions or entity extraction – prior to implementing them into a larger bot system .

    • Test assumptions: Have an idea of what you think may work best with a particular audience or context? See if these assumptions pan out by applying different variations of text to this dataset to see if it works before rolling out changes across other channels or programs that utilize AI/chatbot services

    • Research & Analyze Results : After testing out different scenarios on real-world users by using various forms of q&a within this chatbot pair data set , analyze & record any relevant results pertaining towards understanding user behavior better through further analysis after being exposed to tailored texted conversations about Mental Health topics both passively & actively . The more information you collect here , leads us closer towards creating effective AI powered conversations that bring our desired outcomes from our customer base .

    Research Ideas

    • Developing a chatbot for personalized mental health advice and guidance tailored to individuals' unique needs, experiences, and struggles.
    • Creating an AI-driven diagnostic system that can interpret mental health conversations and provide targeted recommendations for interventions or treatments based on clinical expertise.
    • Designing an AI-powered recommendation engine to suggest relevant content such as articles, videos, or podcasts based on users’ questions or topics of discussion during their conversation with the chatbot

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------------------------| | text | The text of the conversation between the user and the chatbot. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  11. g

    ChatBot Dataset for Transformers

    • gts.ai
    json
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2025). ChatBot Dataset for Transformers [Dataset]. https://gts.ai/dataset-download/chatbot-dataset-for-transformers/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    Description

    Train conversational AI with the ChatBot Dataset for Transformers. Featuring human-like dialogues, preprocessed inputs, and labels, it’s perfect for GPT, BERT, T5, and NLP projects

  12. f

    Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight...

    • frontiersin.figshare.com
    pdf
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li (2023). Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight Management: System Design and Finding.pdf [Dataset]. http://doi.org/10.3389/fnut.2022.870775.s004
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As the obesity rate continues to increase persistently, there is an urgent need to develop an effective weight loss management strategy. Nowadays, the development of artificial intelligence (AI) and cognitive technologies coupled with the rapid spread of messaging platforms and mobile technology with easier access to internet technology offers professional dietitians an opportunity to provide extensive monitoring support to their clients through a chatbot with artificial empathy. This study aimed to design a chatbot with artificial empathic motivational support for weight loss called “SlimMe” and investigate how people react to a diet bot. The SlimMe infrastructure was built using Dialogflow as the natural language processing (NLP) platform and LINE mobile messenger as the messaging platform. We proposed a text-based emotion analysis to simulate artificial empathy responses to recognize the user's emotion. A preliminary evaluation was performed to investigate the early-stage user experience after a 7-day simulation trial. The result revealed that having an artificially empathic diet bot for weight loss management is a fun and exciting experience. The use of emoticons, stickers, and GIF images makes the chatbot response more interactive. Moreover, the motivational support and persuasive messaging features enable the bot to express more empathic and engaging responses to the user. In total, there were 1,007 bot responses from 892 user input messages. Of these, 67.38% (601/1,007) of the chatbot-generated responses were accurate to a relevant user request, 21.19% (189/1,007) inaccurate responses to a relevant request, and 10.31% (92/1,007) accurate responses to an irrelevant request. Only 1.12% (10/1,007) of the chatbot does not answer. We present the design of an artificially empathic diet bot as a friendly assistant to help users estimate their calorie intake and calories burned in a more interactive and engaging way. To our knowledge, this is the first chatbot designed with artificial empathy features, and it looks very promising in promoting long-term weight management. More user interactions and further data training and validation enhancement will improve the bot's in-built knowledge base and emotional intelligence base.

  13. o

    Mental Health Dialogue Training Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Mental Health Dialogue Training Dataset [Dataset]. https://www.opendatabay.com/data/healthcare/8ec5252f-d432-4d05-b55b-25ab4a45b61d
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Mental Health & Wellness
    Description

    This dataset provides real-life conversations focused on mental health concerns, ideal for developing accurate and informative models to assist individuals seeking support for their mental well-being. It includes statements or questions forming the conversation context and expert responses from mental health counselors. The dataset serves as a valuable resource for generating insights and guidance across various aspects of mental health, facilitating the creation of AI-based tools and enhancing professional counselling techniques.

    Columns

    The dataset is provided as a CSV file, train.csv, and features two key columns: * Context: This column contains the initial statements or questions that establish the overall context of the conversation, specifically addressing mental health issues. * Response: This column holds the corresponding replies delivered by a trained mental health counsellor, designed to address and support individuals within the given context.

    Distribution

    The dataset is supplied in a CSV file format named train.csv. It is structured with two primary columns, "Context" and "Response". Specific numbers for rows or records are not detailed in the provided information, but the "Context" column contains 2480 unique values.

    Usage

    This dataset is well-suited for a variety of applications: * Chatbot Development: Utilise it as a training resource for building AI-based mental health chatbots capable of generating relevant responses. * Sentiment Analysis: Apply sentiment analysis techniques to individually or comparatively analyse both the context and response columns. * Topic Modelling: Extract hidden topics within conversations using Natural Language Processing (NLP) methods such as Latent Dirichlet Allocation (LDA) or Non-Negative Matrix Factorization (NMF). * Machine Learning Applications: Classify conversations into different mental health concern categories or train models to generate appropriate responses based on given contexts using approaches like sequence-to-sequence models or transformers. * Research: Analyse to gain insights into common questions, concerns, and themes related to mental health, aiding the understanding of individuals' needs. * Improving Counselling Techniques: Mental health professionals can study successful counselling responses to enhance their skills or develop training programmes.

    Coverage

    The dataset is of a global region. It does not include specific dates or timeframes associated with the conversations, which helps ensure privacy and confidentiality for both the individuals and counsellors involved. It contains sensitive information related to mental health, so ethical considerations, including anonymisation, are vital when using this data for research or practical applications.

    License

    CCO

    Who Can Use It

    • Professionals in the mental health field.
    • Researchers studying mental health conversations and interventions.
    • Developers creating AI-based mental health chatbots and virtual assistants.
    • Mental health professionals looking to enhance their counselling skills or develop training programmes.

    Dataset Name Suggestions

    • Amod Mental Health Counselling Conversations
    • Mental Health Dialogue Training Data
    • Counselling Conversation Dataset
    • Mental Well-being Support Conversations
    • AI Mental Health Chatbot Training Data

    Attributes

    Original Data Source: Amod Mental Health Counseling Conversations

  14. m

    Chat Bot Image Dataset

    • data.macgence.com
    mp3
    Updated Jun 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Chat Bot Image Dataset [Dataset]. https://data.macgence.com/dataset/chat-bot-image-dataset
    Explore at:
    mp3Available download formats
    Dataset updated
    Jun 16, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Access our chatbot image dataset designed for AI training. Ideal for boosting visual recognition, enhancing chatbot interfaces, and optimizing user experience.

  15. 4

    A feedback system for a children’s helpline training-chatbot - Data from a...

    • data.4tu.nl
    zip
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayrton Braam (2023). A feedback system for a children’s helpline training-chatbot - Data from a Survey [Dataset]. http://doi.org/10.4121/9c68a82e-ad6c-420b-88dd-2e86ec729ffb.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Ayrton Braam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The project is a within-subjects study design, with between subjects exploratory measures in order to compare an immediate feedback system to an explanation sheet. The conditions are tested on a simulation of a virtual child, in order to help them navigate a conversational model.

  16. F

    General domain Human-Human conversation chats in Bahasa

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). General domain Human-Human conversation chats in Bahasa [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/bahasa-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    What’s Included

    This training dataset comprises more than 10,000 conversational text data between two native Bahasa people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.

    These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.

    These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.

    This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.

    This training dataset's licence belongs to FutureBeeAI!

  17. Evaluation Dataset for Chatbot/Virtual Assistants

    • kaggle.com
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2022). Evaluation Dataset for Chatbot/Virtual Assistants [Dataset]. https://www.kaggle.com/datasets/bitext/evaluation-dataset-chatbot-virtual-assistants/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bitext
    Description

    Bitext Sample Pre-built Customer Service Evaluation Dataset for English

    Overview

    This Evaluation dataset contains example utterances taken from the "change order" intent from Bitext's pre-built Customer Service domain (which itself covers common intents present across Bitext's 20 pre-built domains). The data can be used to evaluate intent recognition models Natural Language Understanding (NLU) platforms.

    Utterances

    The dataset contains 10,000 utterances, extracted from a larger dataset of over 1,000,000 utterances, including language register variations such as politeness, colloquial, swearing, indirect style... To select the utterances, we use stratified sampling to generate a dataset with a general user language register profile.

    The dataset also reflects commonly occurring linguistic phenomena of real-life chatbots, such as: - spelling mistakes - run-on words - missing punctuation

    Contents

    Each entry in the dataset contains an example utterance along with its corresponding intent, category and additional linguistic information. Each line contains the following four fields: - flags: the applicable linguistic flags - utterance: an example user utterance - category: the high-level intent category - intent: the intent corresponding to the user utterance

    Linguistic flags

    The dataset contains annotations for linguistic phenomena, which can be used to adapt bot training to different user language profiles. These flags are: B - Basic syntactic structure L - Lexical variation (synonyms) M - Morphological variation (plurals, tenses…) C - Complex/Coordinated syntactic structure E - Expanded abbreviations (I'm -> I am, I'd -> I would…) I - Interrogative structure K - Keyword only P - Politeness variation Q - Colloquial variation W - Offensive language Z - Noise (spelling, punctuation…)

    These phenomena make the training dataset more effective and make bots more accurate and robust.

    Categories and Intents

    The intent categories covered by the dataset are: ORDER

    The intents covered by the dataset are: change_order

    (c) Bitext Innovations, 2022

  18. Z

    French trainset for chatbots dealing with usual requests on bank cards

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schild, Erwan (2023). French trainset for chatbots dealing with usual requests on bank cards [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4769949
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset authored and provided by
    Schild, Erwan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    French
    Description

    [EN] French training dataset for chatbots dealing with usual requests on bank cards.

    Description: This dataset represents examples of common customer requests relating to bank cards management. It can be used as a training set for a small chatbot intended to process these usual requests.

    Content: The questions are asked in French. The dataset is divided into 10 intents of 100 questions each, for a total of 1 000 questions.

    Intents scope: Intents are constructed in such a way that all questions arising from the same intention have the same response or action. The scope covered concerns: loss or theft of cards; the swallowed card; the card order; consultation of the bank balance; insurance provided by a card; card unlocking; virtual card management; management of bank overdraft; management of payment limits; management of contactless mode.

    Origin: Intents scope is inspired by a chatbot currently in production, and the wording of the questions are inspired by the usual customers requests.

    [FR] Jeu d'entraînement en français d'assistants conversationnels traitant des demandes courantes sur les cartes bancaires.

    Description : Cet ensemble de données représente des exemples de demandes usuelles des clients concernant la gestion des cartes bancaires. Il peut être utilisé comme jeu d'entraînement pour un assistant conversationnel destiné à traiter ces demandes courantes.

    Contenu : Les questions sont formulées en français. L'ensemble de données est divisé en 10 intentions de 100 questions chacune, pour un total de 1 000 questions.

    Périmètre des intentions : Les intentions sont construites de telle manière que toutes les questions issues d'une même intention ont la même réponse ou action. Le périmètre couvert concerne : la perte ou le vol de cartes ; la carte avalée ; la commande des cartes ; la consultation du solde bancaire ; l'assurance fournie par une carte ; le déverrouillage de la carte ; la gestion de cartes virtuelles ; la gestion du découvert bancaire ; la gestion des plafonds de paiement ; la gestion du mode sans contact.

    Origine : Le périmètre des intentions est inspiré par un chatbot actuellement en production, et la formulation des questions est inspirée de demandes courantes de clients.

  19. Chatbot Store Inventory

    • kaggle.com
    Updated Feb 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steve Levesque (2022). Chatbot Store Inventory [Dataset]. https://www.kaggle.com/datasets/stevelevesque/chatbotstoreinventory/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Steve Levesque
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Used for

    In a toy project chatbot: - https://github.com/steve-levesque/Portfolio-NLP-ChatbotStoreInventory

    Acknowledgements

    Based on the structure in this article: - https://chatbotsmagazine.com/contextual-chat-bots-with-tensorflow-4391749d0077

  20. o

    AI Question Answering Data

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). AI Question Answering Data [Dataset]. https://www.opendatabay.com/data/ai-ml/d3c37fed-f830-444b-a988-c893d3396fd7
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset provides essential information for entries related to question answering tasks using AI models. It is designed to offer valuable insights for researchers and practitioners, enabling them to effectively train and rigorously evaluate their machine learning models. The dataset serves as a valuable resource for building and assessing question-answering systems. It is available free of charge.

    Columns

    • instruction: Contains the specific instructions given to a model to generate a response.
    • responses: Includes the responses generated by the model based on the given instructions.
    • next_response: Provides the subsequent response from the model, following a previous response, which facilitates a conversational interaction.
    • answer: Lists the correct answer for each question presented in the instruction, acting as a reference for assessing the model's accuracy.
    • is_human_response: A boolean column that indicates whether a particular response was created by a human or by a machine learning model, helping to differentiate between the two. Out of nearly 19,300 entries, 254 are human-generated responses, while 18,974 were generated by models.

    Distribution

    The data files are typically in CSV format, with a dedicated train.csv file for training data and a test.csv file for testing purposes. The training file contains a large number of examples. Specific dates are not included within this dataset description, focusing solely on providing accurate and informative details about its content and purpose. Specific numbers for rows or records are not detailed in the available information.

    Usage

    This dataset is ideal for a variety of applications and use cases: * Training and Testing: Utilise train.csv to train question-answering models or algorithms, and test.csv to evaluate their performance on unseen questions. * Machine Learning Model Creation: Develop machine learning models specifically for question-answering by leveraging the instructional components, including instructions, responses, next responses, and human-generated answers, along with their is_human_response labels. * Model Performance Evaluation: Assess model performance by comparing predicted responses with actual human-generated answers from the test.csv file. * Data Augmentation: Expand existing data by paraphrasing instructions or generating alternative responses within similar contexts. * Conversational Agents: Build conversational agents or chatbots by utilising the instruction-response pairs for training. * Language Understanding: Train models to understand language and generate responses based on instructions and previous responses. * Educational Materials: Develop interactive quizzes or study guides, with models providing instant feedback to students. * Information Retrieval Systems: Create systems that help users find specific answers from large datasets. * Customer Support: Train customer support chatbots to provide quick and accurate responses to inquiries. * Language Generation Research: Develop novel algorithms for generating coherent responses in question-answering scenarios. * Automatic Summarisation Systems: Train systems to generate concise summaries by understanding main content through question answering. * Dialogue Systems Evaluation: Use the instruction-response pairs as a benchmark for evaluating dialogue system performance. * NLP Algorithm Benchmarking: Establish baselines against which other NLP tools and methods can be measured.

    Coverage

    The dataset's geographic scope is global. There is no specific time range or demographic scope noted within the available details, as specific dates are not included.

    License

    CC0

    Who Can Use It

    This dataset is highly suitable for: * Researchers and Practitioners: To gain insights into question answering tasks using AI models. * Developers: To train models, create chatbots, and build conversational agents. * Students: For developing educational materials and enhancing their learning experience through interactive tools. * Individuals and teams working on Natural Language Processing (NLP) projects. * Those creating information retrieval systems or customer support solutions. * Experts in natural language generation (NLG) and automatic summarisation systems. * Anyone involved in the evaluation of dialogue systems and machine learning model training.

    Dataset Name Suggestions

    • AI Question Answering Data
    • Conversational AI Training Data
    • NLP Question-Answering Dataset
    • Model Evaluation QA Data
    • Dialogue Response Dataset

    Attributes

    Original Data Source: Question-Answering Training and Testing Data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext-travel-llm-chatbot-training-dataset

bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2025
Dataset authored and provided by
Bitext
License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

  Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.

Search
Clear search
Close search
Google apps
Main menu