100+ datasets found
  1. h

    Bitext-travel-llm-chatbot-training-dataset

    • huggingface.co
    Updated Jun 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2025
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.

  2. Bitext Gen AI Chatbot Customer Support Dataset

    • kaggle.com
    zip
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext Gen AI Chatbot Customer Support Dataset [Dataset]. https://www.kaggle.com/datasets/bitext/bitext-gen-ai-chatbot-customer-support-dataset
    Explore at:
    zip(3007665 bytes)Available download formats
    Dataset updated
    Mar 18, 2024
    Authors
    Bitext
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

    Overview

    This dataset can be used to train Large Language Models such as GPT, Llama2 and Falcon, both for Fine Tuning and Domain Adaptation.

    The dataset has the following specs:

    • Use Case: Intent Detection
    • Vertical: Customer Service
    • 27 intents assigned to 10 categories
    • 26872 question/answer pairs, around 1000 per intent
    • 30 entity/slot types
    • 12 different types of language generation tags

    The categories and intents have been selected from Bitext's collection of 20 vertical-specific datasets, covering the intents that are common across all 20 verticals. The verticals are:

    • Automotive, Retail Banking, Education, Events & Ticketing, Field Services, Healthcare, Hospitality, Insurance, Legal Services, Manufacturing, Media Streaming, Mortgages & Loans, Moving & Storage, Real Estate/Construction, Restaurant & Bar Chains, Retail/E-commerce, Telecommunications, Travel, Utilities, Wealth Management

    For a full list of verticals and its intents see https://www.bitext.com/chatbot-verticals/.

    The question/answer pairs have been generated using a hybrid methodology that uses natural texts as source text, NLP technology to extract seeds from these texts, and NLG technology to expand the seed texts. All steps in the process are curated by computational linguists.

    Dataset Token Count

    The dataset contains an extensive amount of text data across its 'instruction' and 'response' columns. After processing and tokenizing the dataset, we've identified a total of 3.57 million tokens. This rich set of tokens is essential for training advanced LLMs for AI Conversational, AI Generative, and Question and Answering (Q&A) models.

    Fields of the Dataset

    Each entry in the dataset contains the following fields:

    • flags: tags (explained below in the Language Generation Tags section)
    • instruction: a user request from the Customer Service domain
    • category: the high-level semantic category for the intent
    • intent: the intent corresponding to the user instruction
    • response: an example expected response from the virtual assistant

    Categories and Intents

    The categories and intents covered by the dataset are:

    • ACCOUNT: create_account, delete_account, edit_account, recover_password, registration_problems, switch_account
    • CANCELLATION_FEE: check_cancellation_fee
    • CONTACT: contact_customer_service, contact_human_agent
    • DELIVERY: delivery_options, delivery_period
    • FEEDBACK: complaint, review
    • INVOICE: check_invoice, get_invoice
    • ORDER: cancel_order, change_order, place_order, track_order
    • PAYMENT: check_payment_methods, payment_issue
    • REFUND: check_refund_policy, get_refund, track_refund
    • SHIPPING_ADDRESS: change_shipping_address, set_up_shipping_address
    • SUBSCRIPTION: newsletter_subscription

    Entities

    The entities covered by the dataset are:

    • {{Order Number}}, typically present in:
    • Intents: cancel_order, change_order, change_shipping_address, check_invoice, check_refund_policy, complaint, delivery_options, delivery_period, get_invoice, get_refund, place_order, track_order, track_refund
    • {{Invoice Number}}, typically present in:
      • Intents: check_invoice, get_invoice
    • {{Online Order Interaction}}, typically present in:
      • Intents: cancel_order, change_order, check_refund_policy, delivery_period, get_refund, review, track_order, track_refund
    • {{Online Payment Interaction}}, typically present in:
      • Intents: cancel_order, check_payment_methods
    • {{Online Navigation Step}}, typically present in:
      • Intents: complaint, delivery_options
    • {{Online Customer Support Channel}}, typically present in:
      • Intents: check_refund_policy, complaint, contact_human_agent, delete_account, delivery_options, edit_account, get_refund, payment_issue, registration_problems, switch_account
    • {{Profile}}, typically present in:
      • Intent: switch_account
    • {{Profile Type}}, typically present in:
      • Intent: switch_account
    • {{Settings}}, typically present in:
      • Intents: cancel_order, change_order, change_shipping_address, check_cancellation_fee, check_invoice, check_payment_methods, contact_human_agent, delete_account, delivery_options, edit_account, get_invoice, newsletter_subscription, payment_issue, place_order, recover_password, registration_problems, set_up_shipping_address, switch_account, track_order, track_refund
    • {{Online Company Portal Info}}, typically present in:
      • Intents: cancel_order, edit_account
    • {{Date}}, typically present in:
      • Intents: check_invoice, check_refund_policy, get_refund, track_order, track_refund
    • {{Date Range}}, typically present in:
      • Intents: check_cancellation_fee, check_invoice, get_invoice
    • {{Shipping Cut-off Time}}, typically present in:
      • Intent: delivery_options
    • {{Delivery City}}, typically present in:
      • Inten...
  3. Simple chatbot dataset

    • kaggle.com
    zip
    Updated Jul 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dame rajee (2023). Simple chatbot dataset [Dataset]. https://www.kaggle.com/datasets/damerajee/simple-chatbot-dataset
    Explore at:
    zip(3587 bytes)Available download formats
    Dataset updated
    Jul 31, 2023
    Authors
    dame rajee
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This JSON file contains a collection of conversational AI intents designed to motivate and interact with users. The intents cover various topics, including greetings, weather inquiries, hobbies, music, movies, farewells, informal and formal questions, math operations and formulas, prime numbers, geometry concepts, math puzzles, and even a Shakespearean poem.

    The additional intents related to consolidating people and motivating them have been included to provide users with uplifting and encouraging responses. These intents aim to offer support during challenging times, foster teamwork, and provide words of motivation and inspiration to users seeking guidance and encouragement.

    The JSON structure is organized into individual intent objects, each containing a tag to identify the intent, a set of patterns representing user inputs, and corresponding responses provided by the AI model. This dataset can be used to train a conversational AI system to engage in positive interactions with users and offer motivational messages.

  4. g

    University Chatbot Dataset

    • gts.ai
    json
    Updated Jun 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Globose Technology Solutions Private Limited (2024). University Chatbot Dataset [Dataset]. https://gts.ai/dataset-download/university-chatbot-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 30, 2024
    Dataset authored and provided by
    Globose Technology Solutions Private Limited
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The University Chatbot Dataset contains 38 intents covering general university-related inquiries, designed to train, fine-tune, and evaluate conversational AI models in the education sector.

  5. h

    Bitext-retail-banking-llm-chatbot-training-dataset

    • huggingface.co
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-retail-banking-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Retail Banking Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail Banking] sector can be easily achieved using our two-step approach to LLM Fine-Tuning.… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-banking-llm-chatbot-training-dataset.

  6. LLM RAG Chatbot Training Dataset

    • kaggle.com
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Life Bricks Global (2025). LLM RAG Chatbot Training Dataset [Dataset]. https://www.kaggle.com/datasets/lifebricksglobal/llm-rag-chatbot-training-dataset
    Explore at:
    zip(199960 bytes)Available download formats
    Dataset updated
    May 20, 2025
    Authors
    Life Bricks Global
    Description

    We’ve developed another annotated dataset designed specifically for conversational AI and companion AI model training.

    Watch: How To Use The Dataset

    What you have here on Kaggle is our free sample - Think Salon Kitty meets AI

    The 'Time Waster Identification & Retreat Model Dataset', enables AI handler agents to detect when users are likely to churn—saving valuable tokens and preventing wasted compute cycles in conversational models.

    This batch has 167 entries annotated for sentiment, intent, user risk flagging (via behavioural tracking), user Recovery Potential per statement; among others. This dataset is designed to be a niche micro dataset for a specific use case: Time Waster Identification and Retreat.

    👉 Buy the updated version: https://lifebricksglobal.gumroad.com/l/Time-WasterDetection-Dataset

    This dataset is perfect for:

    • Fine-tuning LLM routing logic
    • Building intelligent AI agents for customer engagement
    • Companion AI training + moderation modelling
    • This is part of a broader series of human-agent interaction datasets we are releasing under our independent data licensing program.

    It is designed for AI researchers and developers building:

    • Conversational AI agents
    • Companion AI models
    • Human-agent interaction simulators
    • LLM routing optimization models

    Use case:

    • Conversational AI
    • Companion AI
    • Defence & Aerospace
    • Customer Support AI
    • Gaming / Virtual Worlds
    • LLM Safety Research
    • AI Orchestration Platforms

    This batch has 167 entries annotated for sentiment, intent, user risk flagging (via behavioural tracking), user Recovery Potential per statement; among others. This dataset is designed to be a niche micro dataset for a specific use case: Time Waster Identification and Retreat.

    👉 Good for teams working on conversational AI, companion AI, fraud detectors and those integrating routing logic for voice/chat agents

    👉 Buy the updated version: https://lifebricksglobal.gumroad.com/l/Time-WasterDetection-Dataset

    Contact us on LinkedIn: Life Bricks Global.

    License:

    This dataset is provided under a custom license. By using the dataset, you agree to the following terms:

    Usage: You are allowed to use the dataset for non-commercial purposes, including research, development, and machine learning model training.

    Modification: You may modify the dataset for your own use.

    Redistribution: Redistribution of the dataset in its original or modified form is not allowed without permission.

    Attribution: Proper attribution must be given when using or referencing this dataset.

    No Warranty: The dataset is provided "as-is" without any warranties, express or implied, regarding its accuracy, completeness, or fitness for a particular purpose.

  7. h

    Bitext-restaurants-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-restaurants-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-restaurants-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Restaurants Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [restaurants] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-restaurants-llm-chatbot-training-dataset.

  8. Chatbot Training Dataset

    • kaggle.com
    zip
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Prajapat (2022). Chatbot Training Dataset [Dataset]. https://www.kaggle.com/datasets/saurabhprajapat/chatbot-training-dataset/discussion
    Explore at:
    zip(18260 bytes)Available download formats
    Dataset updated
    Aug 3, 2022
    Authors
    Saurabh Prajapat
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Chatbot are used by almost every tech based company and become trending these days I decided build chatbot so i find this, to get good hands on experience how to build chatbot this dataset is perfect

    Contribute to this dataset and enjoy Kaggling !!!!!!!!!!!!!

  9. Training Dataset for chatbots/Virtual Assistants

    • kaggle.com
    zip
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2022). Training Dataset for chatbots/Virtual Assistants [Dataset]. https://www.kaggle.com/datasets/bitext/training-dataset-for-chatbotsvirtual-assistants/code
    Explore at:
    zip(1214677 bytes)Available download formats
    Dataset updated
    Mar 17, 2022
    Authors
    Bitext
    Description

    Bitext Sample Pre-built Customer Support Dataset for English

    Overview

    This dataset contains example utterances and their corresponding intents from the Customer Support domain. The data can be used to train intent recognition models Natural Language Understanding (NLU) platforms.

    The dataset covers the "Customer Support" domain and includes 27 intents grouped in 11 categories. These intents have been selected from Bitext's collection of 20 domain-specific datasets (banking, retail, utilities...), keeping the intents that are common across domains. See below for a full list of categories and intents.

    Utterances

    The dataset contains over 20,000 utterances, with a varying number of utterances per intent. These utterances have been extracted from a larger dataset of 288,000 utterances (approx. 10,000 per intent), including language register variations such as politeness, colloquial, swearing, indirect style... To select the utterances, we use stratified sampling to generate a dataset with a general user language register profile.

    The dataset also reflects commonly ocurring linguistic phenomena of real-life chatbots, such as: - spelling mistakes - run-on words - missing punctuation

    Contents

    Each entry in the dataset contains an example utterance from the Customer Support domain, along with its corresponding intent, category and additional linguistic information. Each line contains the following four fields: - flags: the applicable linguistic flags - utterance: an example user utterance - category: the high-level intent category - intent: the intent corresponding to the user utterance

    Linguistic flags

    The dataset contains annotations for linguistic phenomena, which can be used to adapt bot training to different user language profiles. These flags are: B - Basic syntactic structure S - Syntactic structure L - Lexical variation (synonyms) M - Morphological variation (plurals, tenses…) I - Interrogative structure C - Complex/Coordinated syntactic structure P - Politeness variation Q - Colloquial variation W - Offensive language E - Expanded abbreviations (I'm -> I am, I'd -> I would…) D - Indirect speech (ask an agent to…) Z - Noise (spelling, punctuation…)

    These phenomena make the training dataset more effective and make bots more accurate and robust.

    Categories and Intents

    The intent categories covered by the dataset are: ACCOUNT CANCELLATION_FEE CONTACT DELIVERY FEEDBACK INVOICES NEWSLETTER ORDER PAYMENT REFUNDS SHIPPING

    The intents covered by the dataset are: cancel_order complaint contact_customer_service contact_human_agent create_account change_order change_shipping_address check_cancellation_fee check_invoices check_payment_methods check_refund_policy delete_account delivery_options delivery_period edit_account get_invoice get_refund newsletter_subscription payment_issue place_order recover_password registration_problems review set_up_shipping_address switch_account track_order track_refund

    (c) Bitext Innovations, 2020

  10. Chatbot-Based English Learning Dataset

    • kaggle.com
    zip
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Chatbot-Based English Learning Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/chatbot-based-english-learning-dataset
    Explore at:
    zip(1635 bytes)Available download formats
    Dataset updated
    Jan 31, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📌 Overview This dataset is designed to support research in AI-driven language learning, specifically for chatbot-based English tutoring. It includes intent classification for chatbot interactions and grammatical error correction to assist users in improving their English proficiency.

    📊 Dataset Structure The dataset consists of 200 rows with the following columns:

    Sentence → User queries for intent classification (e.g., "Can you check my grammar?") Intent → Categorized chatbot responses (e.g., Grammar_Check, Vocabulary_Assistance) Incorrect_Sentence → Common grammatical errors in English writing Corrected_Sentence → AI-corrected versions of the incorrect sentences

  11. AI Girlfriend Chatbot Dataset

    • kaggle.com
    zip
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahib Nanda (2024). AI Girlfriend Chatbot Dataset [Dataset]. https://www.kaggle.com/datasets/imsahibnanda/ai-girlfriend-chatbot-dataset
    Explore at:
    zip(27445 bytes)Available download formats
    Dataset updated
    Jul 20, 2024
    Authors
    Sahib Nanda
    License

    https://www.licenses.ai/ai-licenseshttps://www.licenses.ai/ai-licenses

    Description

    A question answer based dataset for training LLMs or Text2Text model as an AI Based Girlfriend, can be used to done other analysis. Taken from multiple sources and combined. Please ensure that if you train a model, it should be a responsible one.

  12. F

    English Human-Human Chat Dataset for Conversational AI & NLP

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-general-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The English General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world English usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level English conversations covering a broad spectrum of everyday topics.

    Conversational Text Data

    This dataset includes over 15000 chat transcripts, each featuring free-flowing dialogue between two native English speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.

    •
    Words per Chat: 300–700
    •
    Turns per Chat: Up to 50 dialogue turns
    •
    Contributors: 200 native English speakers from the FutureBeeAI Crowd Community
    •
    Format: TXT, DOCS, JSON or CSV (customizable)
    •
    Structure: Each record contains the full chat, topic tag, and metadata block

    Diversity and Domain Coverage

    Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:

    •Music, books, and movies
    •Health and wellness
    •Children and parenting
    •Family life and relationships
    •Food and cooking
    •Education and studying
    •Festivals and traditions
    •Environment and daily life
    •Internet and tech usage
    •Childhood memories and casual chatting

    This diversity ensures the dataset is useful across multiple NLP and language understanding applications.

    Linguistic Authenticity

    Chats reflect informal, native-level English usage with:

    •Colloquial expressions and local dialect influence
    •Domain-relevant terminology
    •Language-specific grammar, phrasing, and sentence flow
    •Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
    •Representation of different writing styles and input quirks to ensure training data realism

    Metadata

    Every chat instance is accompanied by structured metadata, which includes:

    •Participant Age
    •Gender
    •Country/Region
    •Chat Domain
    •Chat Topic
    •Dialect

    This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.

    Data Quality Assurance

    All chat records pass through a rigorous QA process to maintain consistency and accuracy:

    •Manual review for content completeness
    •Format checks for chat turns and metadata
    •Linguistic verification by native speakers
    •Removal of inappropriate or unusable samples

    This ensures a clean, reliable dataset ready for high-performance AI model training.

    Applications

    This dataset is ideal for training and evaluating a wide range of text-based AI systems:

    •Conversational AI / Chatbots
    •Smart assistants and voicebots
    <div

  13. Conversations dataset for chatbot

    • kaggle.com
    zip
    Updated Oct 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanika Malhotra1307 (2023). Conversations dataset for chatbot [Dataset]. https://www.kaggle.com/datasets/kanikamalhotra1307/conversations-dataset-for-chatbot
    Explore at:
    zip(10123 bytes)Available download formats
    Dataset updated
    Oct 24, 2023
    Authors
    Kanika Malhotra1307
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a dataset for conversations in json format. You can use this intents to train your chatbot for different types for conversation. You can also make changes on your own to train your chatbot for new conversations .

  14. G

    Healthcare Chatbot Intent Dataset

    • gomask.ai
    csv, json
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Healthcare Chatbot Intent Dataset [Dataset]. https://gomask.ai/marketplace/datasets/healthcare-chatbot-intent-dataset
    Explore at:
    json, csv(10 MB)Available download formats
    Dataset updated
    Nov 8, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    user_id, timestamp, message_id, sender_type, intent_label, message_text, message_order, transcript_id, confidence_score, conversation_topic, and 1 more
    Description

    This dataset provides detailed, synthetic healthcare chatbot conversations with annotated intent labels, message sequencing, and extracted entities. Designed for training and evaluating conversational AI, it supports intent classification, dialogue modeling, and entity recognition in healthcare virtual assistants. The dataset enables robust analysis of user-bot interactions for improved patient engagement and automation.

  15. m

    Chat Bot Dataset for AI/ML models in Hospitality Sector

    • data.macgence.com
    mp3
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2024). Chat Bot Dataset for AI/ML models in Hospitality Sector [Dataset]. https://data.macgence.com/dataset/chat-bot-dataset-for-aiml-models
    Explore at:
    mp3Available download formats
    Dataset updated
    Aug 4, 2024
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Get a high-quality chatbot dataset for AI/ML models in Hospitality Sector. Ideal for NLP training, improving chatbot responses, and enhancing conversational AI.

  16. m

    Chatbot Dataset for AI/ML models in BFSI Sector

    • data.macgence.com
    mp3
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). Chatbot Dataset for AI/ML models in BFSI Sector [Dataset]. https://data.macgence.com/dataset/chat-bot-dataset-for-aiml-models
    Explore at:
    mp3Available download formats
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    Get a high-quality chatbot dataset for AI/ML models in BFSI Sector. Train with diverse conversational data for accurate, efficient machine learning applications

  17. m

    Chat Bot Dataset for AI/ML models in Ecommerce Sector

    • data.macgence.com
    mp3
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Macgence (2025). Chat Bot Dataset for AI/ML models in Ecommerce Sector [Dataset]. https://data.macgence.com/dataset/chat-bot-dataset-for-aiml-models
    Explore at:
    mp3Available download formats
    Dataset updated
    May 8, 2025
    Dataset authored and provided by
    Macgence
    License

    https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions

    Time period covered
    2025
    Area covered
    Worldwide
    Variables measured
    Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
    Description

    High-quality chatbot dataset for AI/ML models in Ecommerce Sector. Train NLP algorithms with diverse conversational data to enhance chatbot accuracy.

  18. FAQ Datasets for Chatbot Training

    • kaggle.com
    zip
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Srivastava (2020). FAQ Datasets for Chatbot Training [Dataset]. https://www.kaggle.com/abbbhishekkk/faq-datasets-for-chatbot-training
    Explore at:
    zip(269846 bytes)Available download formats
    Dataset updated
    Jun 30, 2020
    Authors
    Abhishek Srivastava
    Description

    Dataset

    This dataset was created by Abhishek Srivastava

    Contents

  19. h

    Bitext-customer-support-llm-chatbot-training-dataset-4k-seed42

    • huggingface.co
    Updated Oct 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Oluwadare (2024). Bitext-customer-support-llm-chatbot-training-dataset-4k-seed42 [Dataset]. https://huggingface.co/datasets/Victorano/Bitext-customer-support-llm-chatbot-training-dataset-4k-seed42
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 11, 2024
    Authors
    Victor Oluwadare
    Description

    Victorano/Bitext-customer-support-llm-chatbot-training-dataset-4k-seed42 dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. F

    Vietnamese Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Vietnamese Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/vietnamese-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Vietnamese Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Vietnamese-speaking regions.

    Participant & Chat Overview

    •
    Participants: 150+ native Vietnamese speakers from the FutureBeeAI Crowd Community
    •
    Conversation Length: 300–700 words per chat
    •
    Turns per Chat: 50–150 dialogue turns across both participants
    •
    Chat Types: Inbound and outbound
    •
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    •
    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    •
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Vietnamese healthcare communication and includes:

    •
    Authentic Naming Patterns: Vietnamese personal names, clinic names, and brands
    •
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Vietnamese formats
    •
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Vietnamese-speaking regions
    •
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    •General inquiries
    •Detailed problem-solving
    •Routine status updates
    •Treatment recommendations
    •Support and feedback interactions

    Each conversation typically includes these structural components:

    •Greetings and verification
    •Information gathering
    •Problem definition
    •Solution delivery
    •Closing messages
    •Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    •Full message history with clear speaker labels
    •Participant identifiers
    •Metadata (e.g., topic tags, region, sentiment)
    •Compatibility with common NLP and ML pipelines
    <h3 style="font-weight:

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bitext (2025). Bitext-travel-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext-travel-llm-chatbot-training-dataset

bitext/Bitext-travel-llm-chatbot-training-dataset

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2025
Dataset authored and provided by
Bitext
License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Bitext - Travel Tagged Training Dataset for LLM-based Virtual Assistants

  Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Travel] sector can be easily achieved using our two-step approach to LLM Fine-Tuning. An overview of… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-travel-llm-chatbot-training-dataset.

Search
Clear search
Close search
Google apps
Main menu