100+ datasets found

h
Bitext-customer-support-llm-chatbot-training-dataset
huggingface.co
opendatalab.com
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
h
Bitext-retail-ecommerce-llm-chatbot-training-dataset
huggingface.co
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 6, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.
Mental Health Conversational Data
kaggle.com
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
elvis (2022). Mental Health Conversational Data [Dataset]. https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
elvis
Description
A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.

This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.

The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.
e
Training data for City of Helsinki chatbots
data.europa.eu
unknown
Updated Feb 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helsingin kaupunginkanslia (2024). Training data for City of Helsinki chatbots [Dataset]. https://data.europa.eu/data/datasets/df89ebc7-930c-439f-b073-da91dfa81d6d?locale=en
Explore at:
unknownAvailable download formats
Dataset updated
Feb 20, 2024
Dataset authored and provided by
Helsingin kaupunginkanslia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Helsinki
Description
City of Helsinki chatbot training data. Data currently includes maternity and child care services’ chatbot NeRo, International House Helsinki chatbot Into, rental apartment search chatbot and outdoor bot Urho training data.

The service responds based on the trained rule-based discussion paths and the question-answer pairs determined by city experts. Knowledge bases consist of several different areas, from which open data is published on the topics of questions (intents), variable/synonymous libraries (entities) and answers (answers) related to the discussion.

The published data consists only of the above mentioned knowledge base areas, no customer discussions will be included for privacy reasons.

NeRo

Maternity and child care services’ chatbot NeRo answered questions about the growth or development of a child and problems related to pregnancy at the Helsinki maternity clinics. In addition to this, customers were able to also ask about topics related to dental care, speech development and nutrition. Today NeRo operates as part of Hester, a chatbot for social services, health care and rescue services division, and continues to serve the maternity and child health services’ customers in an even more versatile continent. The NeRo training data is no longer updated.

Into

International House Helsinki chatbot Into is a 24-hour customer service channel that provides a wide range of information on the official services offered by IHH and advice to support the settling of people who have moved to the Helsinki metropolitan area from abroad. With the help of the service, customers have faster access to International House Helsinki’s wide range of services for the city and the authorities. The service is provided in English and it is intended for all people who have recently moved to the capital region and for international people who are considering moving to the capital region.

The rental apartment search

The rental apartment search chatbot is a 24-hour customer service channel of the City of Helsinki housing services aimed at improving the accessibility of customer service and the customer experience as well as increasing the interactivity of the self-service. The service provides relevant information to each customer’s specific questions faster than by searching for the information on the website.

Urho

The outdoor bot Urho is a chatbot that provides assistance on outdoor and physical activity topics, serving citizens around the clock and, if necessary, directing the conversation to the Helsinki Info service advisors. The service improves the accessibility of customer service, the customer experience and the interactivity of self-service, as well as speeding up the process of finding relevant information for each customer compared to searching for information on a website.

The chatbot has being used on various city outdoor and sports websites, but at the moment it is not on any of the websites. The bot can be used to ask questions about outdoor and sports facilities and services, for example. The service works on a rule-based basis, based on question-answer pairs and discussion dialogues defined by advice and subject-matter experts. The service increases efficiency by allowing the automation of frequently asked questions.

The parking chatbot

The parking chatbot is a customer service channel of city’s parking services. The service provides automated answers to the parking-related questions of city residents and visitors. The service is available at the city parking website of Helsinki.

Attributes

XLSX file, the different categories can be found on the different worksheet tabs.

Intents

XLSX file format: the first column contains example question, the second column ID for intent. That is, first the question method in which a particular thing can be expected to be asked, and then the Intent ID by which the system connects the question to the intent and performs a defined action for it.

Entities

XLSX file format: the first column contains entity ID, the following columns alternative forms for entity.

In the first column, the thing to which you want to be given a synonym or other thing that needs to be associated with that entity. Occasionally, bending forms are also added if the AI does not recognize them reliably enough without. In the following columns, synonyms/other words associated with the same thing. Note! The system from which exports are taken splits the same entity in exports over several lines for unknown reason.

Answers

key = an identifying name (ID) unique to that response in the system. This is referred to in dialogue definitions when assigning a response to a specific intent in a given situation

value = the actual response text given to the client in the user interface. Occasionally includes so-called tags that provide clickable hyperlinks, selection buttons, livechat migration, or other functional elements. Texts separated by verti
AI medical chatbot
kaggle.com
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousef Saeedian (2024). AI medical chatbot [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/ai-medical-chatbot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yousef Saeedian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description:

This dataset comprises transcriptions of conversations between doctors and patients, providing valuable insights into the dynamics of medical consultations. It includes a wide range of interactions, covering various medical conditions, patient concerns, and treatment discussions. The data is structured to capture both the questions and concerns raised by patients, as well as the medical advice, diagnoses, and explanations provided by doctors.

Key Features:

Doctor and Patient Roles: Each conversation is annotated with the role of the speaker (doctor or patient), making it easy to analyze communication patterns.

Medical Context: The dataset includes diverse scenarios, from routine check-ups to more complex medical discussions, offering a broad spectrum of healthcare dialogues.

Natural Language: The conversations are presented in natural language, allowing for the development and testing of NLP models focused on healthcare communication.

Applications: This dataset can be used for various applications, such as building dialogue systems, analyzing communication efficacy, developing medical NLP models, and enhancing patient care through better understanding of doctor-patient interactions.

Potential Use Cases:

NLP Model Training: Train models to understand and generate medical dialogues.

Healthcare Communication Studies: Analyze communication strategies between doctors and patients to improve healthcare delivery.

Medical Chatbots: Develop intelligent medical chatbots that can simulate doctor-patient conversations.

Patient Experience Enhancement: Identify common patient concerns and doctor responses to enhance patient care strategies.

This dataset is a valuable resource for researchers, data scientists, and healthcare professionals interested in the intersection of technology and medicine, aiming to improve healthcare communication through data-driven approaches.
m
Chat Bot Dataset for AI/ML models
data.macgence.com
mp3
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macgence (2024). Chat Bot Dataset for AI/ML models [Dataset]. https://data.macgence.com/dataset/chat-bot-dataset-for-aiml-models
Explore at:
mp3Available download formats
Dataset updated
Aug 4, 2024
Dataset authored and provided by
Macgence
License
https://data.macgence.com/terms-and-conditionshttps://data.macgence.com/terms-and-conditions
Time period covered
2025
Area covered
Worldwide
Variables measured
Outcome, Call Type, Transcriptions, Audio Recordings, Speaker Metadata, Conversation Topics
Description
Get a high-quality chat bot dataset for AI/ML models. Enhance NLP training with diverse conversational data for accurate, efficient machine learning applications.
FAQ Datasets for Chatbot Training
kaggle.com
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek Srivastava (2020). FAQ Datasets for Chatbot Training [Dataset]. https://www.kaggle.com/datasets/abbbhishekkk/faq-datasets-for-chatbot-training/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abhishek Srivastava
Description
Dataset

This dataset was created by Abhishek Srivastava

Contents
Data from: Towards bridging the gap between Knowledge Graphs and Chatbots
figshare.com
zip
Updated Mar 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annemarie Wittig; Aleksandr Perevalov; Andreas Both (2022). Towards bridging the gap between Knowledge Graphs and Chatbots [Dataset]. http://doi.org/10.6084/m9.figshare.19425524.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19425524.v1
Dataset updated
Mar 26, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Annemarie Wittig; Aleksandr Perevalov; Andreas Both
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Supplementary data for the paper "Towards bridging the gap between Knowledge Graphs and Chatbots" at the 22nd International Conference on Web Engineering (ICWE 2022).Chatbots are nowadays being applied widely in different life domains. One major reason for this trend is the mature development process that is supported by large companies and sophisticated conversational platforms. However, the required development steps are mostly done manually while transforming existing knowledge bases into interaction configurations, s.t., algorithms integrated into the conversational platforms are enabled to learn the intended interaction patterns. However, already existing domain knowledge may get vanished while transforming a structured knowledge base into a "flat" text representation without references backwards. In this paper, we aim for an automatic process dedicated to generating interaction configurations for a conversational platform (Google Dialogflow) from an existing domain-specific knowledge base.Our ultimate goal is to generate chatbot configurations automatically, s.t., the quality and efficiency are increased.
F
General domain Human-Human conversation chats in Tamil
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). General domain Human-Human conversation chats in Tamil [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/tamil-general-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
This training dataset comprises more than 10,000 conversational text data between two native Tamil people in the general domain. We have a collection of chats on a variety of different topics/services/issues of daily life, such as music, books, festivals, health, kids, family, environment, study, childhood, cuisine, internet, movies, etc., and that makes the dataset diverse.
These chats consist of language-specific words, and phrases and follow the native way of talking which makes the chats more information-rich for your NLP model. Apart from each chat being specific to the topic, it contains various attributes like people's names, addresses, contact information, email address, time, date, local currency, telephone numbers, local slang, etc too in various formats to make the text data unbiased.
These chat scripts have between 300 and 700 words and up to 50 turns. 150 people that are a part of the FutureBeeAI crowd community contributed to this dataset. You will also receive chat metadata, such as participant age, gender, and country information, along with the chats. Dataset applications include conversational AI, natural language processing (NLP), smart assistants, text recognition, text analytics, and text prediction.
This dataset is being expanded with new chats all the time. We are able to produce text data in a variety of languages to meet your unique requirements. Check out the FutureBeeAI community for a custom collection.
This training dataset's licence belongs to FutureBeeAI!
Data from: Japanese FAQ dataset for e-learning system
zenodo.org
csv, html, tsv
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai (2020). Japanese FAQ dataset for e-learning system [Dataset]. http://doi.org/10.5281/zenodo.2650549
Explore at:
tsv, csv, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2650549
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai; Yasunobu Sumikawa; Masaaki Fujiyoshi; Hisashi Hatakeyama; Masahiro Nagai
Description
This dataset includes FAQ data and their categories to train a chatbot specialized for e-learning system used in Tokyo Metropolitan University. We report accuracies of the chatbot in the following paper.

Yasunobu Sumikawa, Masaaki Fujiyoshi, Hisashi Hatakeyama, and Masahiro Nagai "Supporting Creation of FAQ Dataset for E-learning Chatbot", Intelligent Decision Technologies, Smart Innovation, IDT'19, Springer, 2019, to appear.

This dataset is based on real Q&A data about how to use the e-learning system asked by students and teachers who use it in practical classes. The duration we collected the Q&A data is from April 2015 to July 2018.

We attach an English version dataset translated from the Japanese dataset to ease understand what contents our dataset has. Note here that we did not perform any evaluations on the English version dataset; there are no results how accurate chatbots responds to questions.

File contents:

FAQ data (*.csv)

Answer2Category.csv: Categories of answers.

Answer2Tag.csv: Titles of answers.

Answers.csv: IDs for answers and texts of answers.

Categories.csv: Names of categories for answers.

Questions.csv: Texts of questions and their corresponding answer IDs.

Answers_english.csv: IDs for answers and texts of answers written in English.

Categories_english.csv: Names of categories for answers and their corresponding English names.

Questions_english.csv: Texts of questions and their corresponding answer IDs written in English.

Statistics (*.tsv)
Results of statistical analyses for the dataset. We used Calinski and Harabaz method, mutual information, Jaccard Index, TF-IDF+KL divergence, and TF-IDF+JS divergence in order to measure qualities of the dataset. In the analyses, we regard each answer as a cluster for questions. We also perform the same analyses for categories by regarding them as clusters for answers.

Grants: JSPS KAKENHI Grant Number 18H01057
Chatbot dataset
kaggle.com
Updated Feb 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirali vaghani (2023). Chatbot dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5024271
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5024271
Dataset updated
Feb 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nirali vaghani
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset include JSON file made for University chatbot so it contain information about University Inquiry for ordinary puprose. In this file contains list of intents with tags, pattern, reponses and context set. The file include 38 intents or called tags.This dataset can be used for training and evaluating chatbot models.

To add tags you have to write one important word which included in your every questions or pattern asked by user so that by tag chatbot gives you appropriate answers. For instance, If you want to add questions about fees then your tag name must be fees and for how many hour your collage opens or time of your university then your tag name should be hours. However, this file contains many tags like greetings, fees, numbers, hours, events, floors, canteens, hod, admission and many more. The patterns refers to the questions which you want to include and which you think that user might be ask during their inquiry. The response category filled up by you your response which you want to give to user if they ask any queries. Last, The context_set field is left empty in this case, but it could be used to specify a particular context in which a given intent should be used.

Tis data is collected or edited in october 2022 by manually adding questions and responses.

Usages There are just a few examples of the many ways that chatbots can be used:

Education: Chatbots can be used in education to provide students with personalized learning experiences, answer questions about coursework, and provide feedback on assignments.

Customer Service: Chatbots can be used to provide customer service support 24/7. They can answer frequently asked questions and provide personalized assistance to customers.

Healthcare: Chatbots can be used to provide medical advice, schedule appointments, and help patients manage their health.

Banking: Chatbots can be used in the banking industry to help customers with their accounts, answer questions about transactions, and provide information about bank products.

Travel: Chatbots can be used in the travel industry to help customers with booking flights, hotels, and rental cars, as well as answer questions about travel destinations.

Human Resources: Chatbots can be used in human resources to help employees with their benefits, answer questions about company policies, and provide information about job openings.

E-commerce: Chatbots can help customers with product recommendations, track orders, and process payments. They can also provide product information and answer questions.

As technology continues to advance, the potential applications for chatbots will continue to expand.
m
dataset
data.mendeley.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vignesh A (2023). dataset [Dataset]. http://doi.org/10.17632/cpp3bx8ghd.1
Explore at:
Unique identifier
https://doi.org/10.17632/cpp3bx8ghd.1
Dataset updated
Oct 4, 2023
Authors
Vignesh A
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains SQUAD and NarrativeQA dataset files
f
Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li (2023). Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight Management: System Design and Finding.pdf [Dataset]. http://doi.org/10.3389/fnut.2022.870775.s004
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnut.2022.870775.s004
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Annisa Ristya Rahmanti; Hsuan-Chia Yang; Bagas Suryo Bintoro; Aldilas Achmad Nursetyo; Muhammad Solihuddin Muhtar; Shabbir Syed-Abdul; Yu-Chuan Jack Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As the obesity rate continues to increase persistently, there is an urgent need to develop an effective weight loss management strategy. Nowadays, the development of artificial intelligence (AI) and cognitive technologies coupled with the rapid spread of messaging platforms and mobile technology with easier access to internet technology offers professional dietitians an opportunity to provide extensive monitoring support to their clients through a chatbot with artificial empathy. This study aimed to design a chatbot with artificial empathic motivational support for weight loss called “SlimMe” and investigate how people react to a diet bot. The SlimMe infrastructure was built using Dialogflow as the natural language processing (NLP) platform and LINE mobile messenger as the messaging platform. We proposed a text-based emotion analysis to simulate artificial empathy responses to recognize the user's emotion. A preliminary evaluation was performed to investigate the early-stage user experience after a 7-day simulation trial. The result revealed that having an artificially empathic diet bot for weight loss management is a fun and exciting experience. The use of emoticons, stickers, and GIF images makes the chatbot response more interactive. Moreover, the motivational support and persuasive messaging features enable the bot to express more empathic and engaging responses to the user. In total, there were 1,007 bot responses from 892 user input messages. Of these, 67.38% (601/1,007) of the chatbot-generated responses were accurate to a relevant user request, 21.19% (189/1,007) inaccurate responses to a relevant request, and 10.31% (92/1,007) accurate responses to an irrelevant request. Only 1.12% (10/1,007) of the chatbot does not answer. We present the design of an artificially empathic diet bot as a friendly assistant to help users estimate their calorie intake and calories burned in a more interactive and engaging way. To our knowledge, this is the first chatbot designed with artificial empathy features, and it looks very promising in promoting long-term weight management. More user interactions and further data training and validation enhancement will improve the bot's in-built knowledge base and emotional intelligence base.
g
ChatBot Dataset for Transformers
gts.ai
json
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2025). ChatBot Dataset for Transformers [Dataset]. https://gts.ai/dataset-download/chatbot-dataset-for-transformers/
Explore at:
jsonAvailable download formats
Dataset updated
Jan 9, 2025
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
Description
Train conversational AI with the ChatBot Dataset for Transformers. Featuring human-like dialogues, preprocessed inputs, and labels, it’s perfect for GPT, BERT, T5, and NLP projects
d
Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...
datarade.ai
.json, .csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum, Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training [Dataset]. https://datarade.ai/data-products/xverum-company-data-b2b-data-belgium-netherlands-denm-xverum
Explore at:
.json, .csvAvailable download formats
Dataset provided by
Xverum LLC
Authors
Xverum
Area covered
Dominican Republic, Jordan, United Kingdom, Barbados, Oman, India, Norway, Western Sahara, Sint Maarten (Dutch part), Cook Islands
Description
Xverum’s AI & ML Training Data provides one of the most extensive datasets available for AI and machine learning applications, featuring 800M B2B profiles with 100+ attributes. This dataset is designed to enable AI developers, data scientists, and businesses to train robust and accurate ML models. From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries and use cases with unparalleled scale, depth, and quality.

What Makes Our Data Unique?

Scale and Coverage: - A global dataset encompassing 800M B2B profiles from a wide array of industries and geographies. - Includes coverage across the Americas, Europe, Asia, and other key markets, ensuring worldwide representation.

Rich Attributes for Training Models: - Over 100 fields of detailed information, including company details, job roles, geographic data, industry categories, past experiences, and behavioral insights. - Tailored for training models in NLP, recommendation systems, and predictive algorithms.

Compliance and Quality: - Fully GDPR and CCPA compliant, providing secure and ethically sourced data. - Extensive data cleaning and validation processes ensure reliability and accuracy.

Annotation-Ready: - Pre-structured and formatted datasets that are easily ingestible into AI workflows. - Ideal for supervised learning with tagging options such as entities, sentiment, or categories.

How Is the Data Sourced? - Publicly available information gathered through advanced, GDPR-compliant web aggregation techniques. - Proprietary enrichment pipelines that validate, clean, and structure raw data into high-quality datasets. This approach ensures we deliver comprehensive, up-to-date, and actionable data for machine learning training.

Primary Use Cases and Verticals

Natural Language Processing (NLP): Train models for named entity recognition (NER), text classification, sentiment analysis, and conversational AI. Ideal for chatbots, language models, and content categorization.

Predictive Analytics and Recommendation Systems: Enable personalized marketing campaigns by predicting buyer behavior. Build smarter recommendation engines for ecommerce and content platforms.

B2B Lead Generation and Market Insights: Create models that identify high-value leads using enriched company and contact information. Develop AI systems that track trends and provide strategic insights for businesses.

HR and Talent Acquisition AI: Optimize talent-matching algorithms using structured job descriptions and candidate profiles. Build AI-powered platforms for recruitment analytics.

How This Product Fits Into Xverum’s Broader Data Offering Xverum is a leading provider of structured, high-quality web datasets. While we specialize in B2B profiles and company data, we also offer complementary datasets tailored for specific verticals, including ecommerce product data, job listings, and customer reviews. The AI Training Data is a natural extension of our core capabilities, bridging the gap between structured data and machine learning workflows. By providing annotation-ready datasets, real-time API access, and customization options, we ensure our clients can seamlessly integrate our data into their AI development processes.

Why Choose Xverum? - Experience and Expertise: A trusted name in structured web data with a proven track record. - Flexibility: Datasets can be tailored for any AI/ML application. - Scalability: With 800M profiles and more being added, you’ll always have access to fresh, up-to-date data. - Compliance: We prioritize data ethics and security, ensuring all data adheres to GDPR and other legal frameworks.

Ready to supercharge your AI and ML projects? Explore Xverum’s AI Training Data to unlock the potential of 800M global B2B profiles. Whether you’re building a chatbot, predictive algorithm, or next-gen AI application, our data is here to help.

Contact us for sample datasets or to discuss your specific needs.
F
Punjabi Human-Human Chat Dataset for Conversational AI & NLP
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Punjabi Human-Human Chat Dataset for Conversational AI & NLP [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/punjabi-general-domain-conversation-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Punjabi General Domain Chat Dataset is a high-quality, text-based dataset designed to train and evaluate conversational AI, NLP models, and smart assistants in real-world Punjabi usage. Collected through FutureBeeAI’s trusted crowd community, this dataset reflects natural, native-level Punjabi conversations covering a broad spectrum of everyday topics.
Conversational Text Data
This dataset includes over 10000 chat transcripts, each featuring free-flowing dialogue between two native Punjabi speakers. The conversations are spontaneous, context-rich, and mimic informal, real-life texting behavior.
•
Words per Chat: 300–700

•
Turns per Chat: Up to 50 dialogue turns

•
Contributors: 150 native Punjabi speakers from the FutureBeeAI Crowd Community

•
Format: TXT, DOCS, JSON or CSV (customizable)

•
Structure: Each record contains the full chat, topic tag, and metadata block

Diversity and Domain Coverage
Conversations span a wide variety of general-domain topics to ensure comprehensive model exposure:
•Music, books, and movies
•Health and wellness
•Children and parenting
•Family life and relationships
•Food and cooking
•Education and studying
•Festivals and traditions
•Environment and daily life
•Internet and tech usage
•Childhood memories and casual chatting
This diversity ensures the dataset is useful across multiple NLP and language understanding applications.
Linguistic Authenticity
Chats reflect informal, native-level Punjabi usage with:
•Colloquial expressions and local dialect influence
•Domain-relevant terminology
•Language-specific grammar, phrasing, and sentence flow
•Inclusion of realistic details such as names, phone numbers, email addresses, locations, dates, times, local currencies, and culturally grounded references
•Representation of different writing styles and input quirks to ensure training data realism
Metadata
Every chat instance is accompanied by structured metadata, which includes:
•Participant Age
•Gender
•Country/Region
•Chat Domain
•Chat Topic
•Dialect
This metadata supports model filtering, demographic-specific evaluation, and more controlled fine-tuning workflows.
Data Quality Assurance
All chat records pass through a rigorous QA process to maintain consistency and accuracy:
•Manual review for content completeness
•Format checks for chat turns and metadata
•Linguistic verification by native speakers
•Removal of inappropriate or unusable samples
This ensures a clean, reliable dataset ready for high-performance AI model training.
Applications
This dataset is ideal for training and evaluating a wide range of text-based AI systems:
•Conversational AI / Chatbots
•Smart assistants and voicebots
<div
Chatbot Market Analysis, Size, and Forecast 2025-2029: North America (US and...
technavio.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Chatbot Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), Middle East and Africa (Egypt, KSA, Oman, and UAE), APAC (China, India, and Japan), South America (Argentina and Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/chatbot-market-industry-analysis
Explore at:
Dataset updated
Feb 15, 2025
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global
Description
Snapshot img

Chatbot Market Size 2025-2029

The chatbot market size is forecast to increase by USD 9.63 billion, at a CAGR of 42.9% between 2024 and 2029.

The market is witnessing significant growth, driven by the integration of chatbots with various communication channels such as social media, websites, and messaging apps. This integration enables businesses to engage with customers in real-time, providing instant responses and enhancing customer experience. However, the market faces challenges, including the lack of awareness and standardization of chatbot services. Despite these obstacles, the potential benefits of chatbots, including cost savings, increased efficiency, and improved customer engagement, make it an attractive investment for businesses seeking to enhance their digital presence and streamline operations. Companies looking to capitalize on this market opportunity should focus on developing chatbot solutions that offer customizable features, seamless integration with existing systems, and natural language processing capabilities to deliver human-like interactions. Navigating the challenges of awareness and standardization will require targeted marketing efforts and collaborations with industry partners to establish best practices and industry standards.

What will be the Size of the Chatbot Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free SampleThe market continues to evolve, with dynamic market dynamics shaping its growth and applications across various sectors. Conversational AI, a key component of chatbots, is advancing with the integration of sentiment analysis, emotional intelligence, and meteor score to enhance user experience. Pre-trained models and language understanding are being utilized to improve performance metrics, while neural networks and contextual awareness enable more accurate intent recognition. Deployment strategies, including policy learning and cloud platforms, are evolving to support cross-platform compatibility and multi-lingual support. Performance metrics, such as F1-score and response time, are crucial in evaluating model effectiveness. Reinforcement learning and knowledge base integration are essential for chatbot development and lead generation. Error rate and character error rate are critical in speech recognition, while API integration and dialogue state tracking facilitate seamless conversational experiences. Technical support and customer engagement are primary applications of chatbots, with sales conversion and automated responses optimizing business operations. Deep learning architectures and transfer learning are driving advancements in question answering and natural language processing. Contextualized word embeddings and dialogue management are essential for effective user interaction. Overall, the market is an ever-evolving landscape, with continuous innovation and integration of advanced technologies shaping its future.

How is this Chatbot Industry segmented?

The chatbot industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. End-userRetailBFSIGovernmentTravel and hospitalityOthersProductSolutionsServicesDeploymentCloud-BasedOn-PremiseHybridApplicationCustomer ServiceSales and MarketingHealthcare SupportE-Commerce AssistanceGeographyNorth AmericaUSCanadaEuropeFranceGermanyItalyUKMiddle East and AfricaEgyptKSAOmanUAEAPACChinaIndiaJapanSouth AmericaArgentinaBrazilRest of World (ROW)

By End-user Insights

The retail segment is estimated to witness significant growth during the forecast period.The market is experiencing significant growth, particularly in the retail sector. E-commerce giants like Amazon, Flipkart, Alibaba, and Snapdeal are leading this trend, integrating chatbots to improve customer experience during online product searches. These AI-powered bots facilitate quick and effective resolution of payment-related queries, enhancing the shopping experience. However, retailers face challenges in ensuring a seamless user experience, as consumers increasingly prefer mobile shopping. Deep learning architectures and natural language processing (NLP) are crucial components of chatbot development. NLP enables intent recognition, sentiment analysis, and entity extraction, while deep learning models provide contextual awareness and dialogue management. Speech recognition and dialogue state tracking further enhance the user experience. Cross-platform compatibility and multi-lingual support are essential features for chatbots, catering to diverse user bases. Pre-trained models and transfer learning enable faster development and deployment. Reinforcement learning and policy learning optimize bot
LLM prompts in the context of machine learning
kaggle.com
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan Nelson (2024). LLM prompts in the context of machine learning [Dataset]. https://www.kaggle.com/datasets/jordanln/llm-prompts-in-the-context-of-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2024
Dataset provided by
Kaggle
Authors
Jordan Nelson
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is an extension of my previous work on creating a dataset for natural language processing tasks. It leverages binary representation to characterise various machine learning models. The attributes in the dataset are derived from a dictionary, which was constructed from a corpus of prompts typically provided to a large language model (LLM). These prompts reference specific machine learning algorithms and their implementations. For instance, consider a user asking an LLM or a generative AI to create a Multi-Layer Perceptron (MLP) model for a particular application. By applying this concept to multiple machine learning models, we constructed our corpus. This corpus was then transformed into the current dataset using a bag-of-words approach. In this dataset, each attribute corresponds to a word from our dictionary, represented as a binary value: 1 indicates the presence of the word in a given prompt, and 0 indicates its absence. At the end of each entry, there is a label. Each entry in the dataset pertains to a single class, where each class represents a distinct machine learning model or algorithm. This dataset is intended for multi-class classification tasks, not multi-label classification, as each entry is associated with only one label and does not belong to multiple labels simultaneously. This dataset has been utilised with a Convolutional Neural Network (CNN) using the Keras Automodel API, achieving impressive training and testing accuracy rates exceeding 97%. Post-training, the model's predictive performance was rigorously evaluated in a production environment, where it continued to demonstrate exceptional accuracy. For this evaluation, we employed a series of questions, which are listed below. These questions were intentionally designed to be similar to ensure that the model can effectively distinguish between different machine learning models, even when the prompts are closely related.

KNN How would you create a KNN model to classify emails as spam or not spam based on their content and metadata? How could you implement a KNN model to classify handwritten digits using the MNIST dataset? How would you use a KNN approach to build a recommendation system for suggesting movies to users based on their ratings and preferences? How could you employ a KNN algorithm to predict the price of a house based on features such as its location, size, and number of bedrooms etc? Can you create a KNN model for classifying different species of flowers based on their petal length, petal width, sepal length, and sepal width? How would you utilise a KNN model to predict the sentiment (positive, negative, or neutral) of text reviews or comments? Can you create a KNN model for me that could be used in malware classification? Can you make me a KNN model that can detect a network intrusion when looking at encrypted network traffic? Can you make a KNN model that would predict the stock price of a given stock for the next week? Can you create a KNN model that could be used to detect malware when using a dataset relating to certain permissions a piece of software may have access to?

Decision Tree Can you describe the steps involved in building a decision tree model to classify medical images as malignant or benign for cancer diagnosis and return a model for me? How can you utilise a decision tree approach to develop a model for classifying news articles into different categories (e.g., politics, sports, entertainment) based on their textual content? What approach would you take to create a decision tree model for recommending personalised university courses to students based on their academic strengths and weaknesses? Can you describe how to create a decision tree model for identifying potential fraud in financial transactions based on transaction history, user behaviour, and other relevant data? In what ways might you apply a decision tree model to classify customer complaints into different categories determining the severity of language used? Can you create a decision tree classifier for me? Can you make me a decision tree model that will help me determine the best course of action across a given set of strategies? Can you create a decision tree model for me that can recommend certain cars to customers based on their preferences and budget? How can you make a decision tree model that will predict the movement of star constellations in the sky based on data provided by the NASA website? How do I create a decision tree for time-series forecasting?

Random Forest Can you describe the steps involved in building a random forest model to classify different types of anomalies in network traffic data for cybersecurity purposes and return the code for me? In what ways could you implement a random forest model to predict the severity of traffic congestion in urban areas based on historical traffic patterns, weather...
Airport Digital Twin Chatbot Training Market Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Airport Digital Twin Chatbot Training Market Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/airport-digital-twin-chatbot-training-market-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Airport Digital Twin Chatbot Training Market Outlook

According to our latest research, the global Airport Digital Twin Chatbot Training market size in 2024 stands at USD 1.13 billion, reflecting the rapid adoption of advanced digital solutions in the aviation sector. The market is expected to witness a robust growth trajectory, registering a CAGR of 18.7% from 2025 to 2033. By 2033, the market is projected to reach USD 5.86 billion, driven by increasing investments in airport modernization, the proliferation of artificial intelligence (AI) technologies, and the pressing need for enhanced passenger experience and operational efficiency.

The key growth factor propelling the Airport Digital Twin Chatbot Training market is the escalating demand for real-time data-driven decision-making in airport operations. As airports grapple with growing passenger volumes and heightened security requirements, the integration of digital twin technology with AI-powered chatbots enables seamless simulation, monitoring, and management of complex airport environments. This convergence empowers stakeholders to predict potential bottlenecks, optimize resource allocation, and proactively address operational disruptions. Furthermore, the ability of digital twin chatbots to learn and adapt through continuous training ensures that airports remain agile and responsive to evolving operational challenges, thereby fostering a culture of innovation and continuous improvement.

Another significant driver is the imperative to elevate the passenger experience amid intensifying competition among airports globally. Digital twin chatbots, trained on vast datasets encompassing passenger behavior, flight schedules, and facility management, can deliver personalized assistance, streamline check-in processes, and provide real-time updates, thereby reducing wait times and enhancing overall satisfaction. The adoption of these technologies not only improves passenger engagement but also contributes to brand differentiation for airports and airlines. As customer expectations for seamless, contactless, and efficient services continue to rise, the deployment of intelligent chatbot solutions is becoming a strategic priority for airport operators aiming to secure a competitive edge.

The market’s expansion is further fueled by regulatory mandates and industry initiatives aimed at strengthening airport security and sustainability. Digital twin chatbots play a pivotal role in simulating security scenarios, monitoring compliance, and facilitating rapid response to incidents. Additionally, they support predictive maintenance and energy management, aligning with global efforts to reduce the carbon footprint of aviation infrastructure. The synergy between regulatory compliance, operational resilience, and environmental stewardship is accelerating the adoption of digital twin chatbot training solutions across airports of varying scales and complexities.

From a regional perspective, North America currently leads the market, underpinned by substantial investments in airport infrastructure, a mature digital ecosystem, and the presence of leading technology providers. However, Asia Pacific is poised for the fastest growth, driven by the surge in air travel, large-scale airport development projects, and government initiatives promoting smart airport technologies. Europe remains a significant contributor, with a focus on sustainability and passenger-centric innovations. Meanwhile, the Middle East & Africa and Latin America are emerging as promising markets, supported by strategic investments in aviation and digital transformation efforts.

Component Analysis

The Component segment of the Airport Digital Twin Chatbot Training market is bifurcated into Software and Services. The software sub-segment encompasses the core digital twin platforms, AI-powered chatbot engines, and integrated analytics tools that form the backbone of intelligent airport operations. These solutions are designed t
B
Bot Platforms Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Bot Platforms Software Report [Dataset]. https://www.datainsightsmarket.com/reports/bot-platforms-software-1448698
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Jun 16, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Bot Platforms Software market, currently valued at $956 million in 2025, is projected to experience robust growth, driven by the increasing adoption of AI-powered chatbots across diverse industries. This growth is fueled by the need for enhanced customer service, automation of routine tasks, and the rising demand for personalized user experiences. Key market drivers include the decreasing cost of cloud computing resources, advancements in natural language processing (NLP) and machine learning (ML) technologies, and the growing integration of bots across various platforms like messaging apps, websites, and social media. The market is segmented by deployment (cloud, on-premise), application (customer service, marketing, sales), and organization size (small, medium, large). Leading players like Amazon, Google, Microsoft, and IBM are actively shaping the market landscape through continuous innovation and strategic partnerships, while smaller, specialized players focus on niche applications. The competitive landscape is dynamic, with mergers and acquisitions expected to further consolidate the market. The forecasted Compound Annual Growth Rate (CAGR) of 10.4% from 2025 to 2033 signifies a considerable expansion in market size. This consistent growth trajectory reflects the ongoing digital transformation across sectors and the increasing reliance on automation to optimize processes and improve operational efficiency. The market faces challenges such as data security concerns, integration complexities, and the need for robust training data to ensure accurate chatbot performance. However, these challenges are likely to be mitigated through technological advancements and the development of more sophisticated and secure bot platform solutions. The market's future is promising, with significant opportunities for growth in emerging markets and expansion into new application areas, solidifying bot platforms as an essential component of the modern digital ecosystem.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset

Bitext-customer-support-llm-chatbot-training-dataset

bitext/Bitext-customer-support-llm-chatbot-training-dataset

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 16, 2024

Dataset authored and provided by

Bitext

License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

  Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

Clear search

Close search

Google apps

Main menu

Bitext-customer-support-llm-chatbot-training-dataset

Bitext-retail-ecommerce-llm-chatbot-training-dataset

Mental Health Conversational Data

Training data for City of Helsinki chatbots

Attributes

AI medical chatbot

Chat Bot Dataset for AI/ML models

FAQ Datasets for Chatbot Training

Dataset

Contents

Data from: Towards bridging the gap between Knowledge Graphs and Chatbots

General domain Human-Human conversation chats in Tamil

What’s Included

Data from: Japanese FAQ dataset for e-learning system

Chatbot dataset

dataset

Data_Sheet_4_SlimMe, a Chatbot With Artificial Empathy for Personal Weight...

ChatBot Dataset for Transformers

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning...

Punjabi Human-Human Chat Dataset for Conversational AI & NLP

Introduction

Conversational Text Data

Diversity and Domain Coverage

Linguistic Authenticity

Metadata

Data Quality Assurance

Applications

Chatbot Market Analysis, Size, and Forecast 2025-2029: North America (US and...

Snapshot img

LLM prompts in the context of machine learning

Airport Digital Twin Chatbot Training Market Market Research Report 2033

Airport Digital Twin Chatbot Training Market Outlook

Component Analysis

Bot Platforms Software Report

Bitext-customer-support-llm-chatbot-training-datasetSee More Versions

bitext/Bitext-customer-support-llm-chatbot-training-dataset

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Bitext-customer-support-llm-chatbot-training-dataset