100+ datasets found
  1. h

    chatbot_arena_conversations

    • huggingface.co
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Large Model Systems Organization (2023). chatbot_arena_conversations [Dataset]. https://huggingface.co/datasets/lmsys/chatbot_arena_conversations
    Explore at:
    Dataset updated
    Jul 18, 2023
    Dataset authored and provided by
    Large Model Systems Organization
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    Chatbot Arena Conversations Dataset

    This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.

  2. h

    lmsys-chat-1m

    • huggingface.co
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    1-800-SHARED-TASKS (2024). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/1-800-SHARED-TASKS/lmsys-chat-1m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2024
    Dataset authored and provided by
    1-800-SHARED-TASKS
    Description

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/1-800-SHARED-TASKS/lmsys-chat-1m.

  3. h

    Bitext-customer-support-llm-chatbot-training-dataset

    • huggingface.co
    • opendatalab.com
    Updated Jul 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

  4. Mental Health Chatbot Pairs

    • kaggle.com
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Mental Health Chatbot Pairs [Dataset]. https://www.kaggle.com/datasets/thedevastator/mental-health-chatbot-pairs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Mental Health Chatbot Pairs

    AI-based Tailored Support for Mental Health Conversation

    By Huggingface Hub [source]

    About this dataset

    This dataset contains a compilation of carefully-crafted Q&A pairs which are designed to provide AI-based tailored support for mental health. These carefully chosen questions and answers offer an avenue for those looking for help to gain the assistance they need. With these pre-processed conversations, Artificial Intelligence (AI) solutions can be developed and deployed to better understand and respond appropriately to individual needs based on their input. This comprehensive dataset is crafted by experts in the mental health field, providing insightful content that will further research in this growing area. These data points will be invaluable for developing the next generation of personalized AI-based mental health chatbots capable of truly understanding what people need

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains pre-processed Q&A pairs for AI-based tailored support for mental health. As such, it represents an excellent starting point in building a conversational model which can handle conversations about mental health issues. Here are some tips on how to use this dataset to its fullest potential:

    • Understand your data: Spend time getting to know the text of the conversation between the user and the chatbot and familiarize yourself with what type of questions and answers are included in this specific dataset. This will help you better formulate queries for your own conversational model or develop new ones you can add yourself.

    • Refine your language processing models: By studying the patterns in syntax, grammar, tone, voice, etc., within this conversational data set you can hone your natural language processing capabilities - such as keyword extractions or entity extraction – prior to implementing them into a larger bot system .

    • Test assumptions: Have an idea of what you think may work best with a particular audience or context? See if these assumptions pan out by applying different variations of text to this dataset to see if it works before rolling out changes across other channels or programs that utilize AI/chatbot services

    • Research & Analyze Results : After testing out different scenarios on real-world users by using various forms of q&a within this chatbot pair data set , analyze & record any relevant results pertaining towards understanding user behavior better through further analysis after being exposed to tailored texted conversations about Mental Health topics both passively & actively . The more information you collect here , leads us closer towards creating effective AI powered conversations that bring our desired outcomes from our customer base .

    Research Ideas

    • Developing a chatbot for personalized mental health advice and guidance tailored to individuals' unique needs, experiences, and struggles.
    • Creating an AI-driven diagnostic system that can interpret mental health conversations and provide targeted recommendations for interventions or treatments based on clinical expertise.
    • Designing an AI-powered recommendation engine to suggest relevant content such as articles, videos, or podcasts based on users’ questions or topics of discussion during their conversation with the chatbot

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------------------------| | text | The text of the conversation between the user and the chatbot. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  5. h

    ai-medical-chatbot

    • huggingface.co
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruslan Magana Vsevolodovna (2024). ai-medical-chatbot [Dataset]. https://huggingface.co/datasets/ruslanmv/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Authors
    Ruslan Magana Vsevolodovna
    Description

    AI Medical Chatbot Dataset

    This is an experimental Dataset designed to run a Medical Chatbot It contains at least 250k dialogues between a Patient and a Doctor.

      Playground ChatBot
    

    ruslanmv/AI-Medical-Chatbot For furter information visit the project here: https://github.com/ruslanmv/ai-medical-chatbot

  6. Glaive Function Calling V2

    • kaggle.com
    • huggingface.co
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Glaive Function Calling V2 [Dataset]. https://www.kaggle.com/datasets/thedevastator/ai-chatbot-conversational-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    AI Chatbot Conversational Data

    A Knowledge Base for Trainable Natural Language Processing

    By Huggingface Hub [source]

    About this dataset

    This dataset contains valuable records of conversations between humans and AI-driven chatbots in real-world scenarios. This is a great opportunity to explore the nuances and intricacies of conversations between humans and machines, opening the door to interesting research directions for machine learning, artificial intelligence, natural language processing (NLP), and beyond. With this data, researchers can determine how well machines are able to simulate real conversation behavior such as nonverbal exchanges, intonations, humorous insights or even sarcasm. The data also provides an avenue for comparative studies between human behavior and AI capabilities in carrying out meaningful dialogues with humans. This knowledge base is invaluable for those who aim to create more astounding AI systems that can closely imitate comprehensible speech patterns through their trained technology models

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    How to Use this Dataset

    This dataset contains conversations between humans and AI-driven chatbots in real-world scenarios. With this dataset, you will be able to use the data to build an AI system that can respond intelligently in natural language conversations. For example, you can build a system with the ability to further engage users by replying with meaningful responses as the conversation progresses.

    In order to get started, first familiarize yourself with the columns included in this dataset: 'chat' and 'system'. The column 'chat' contains conversations between humans and chatbot systems while the column 'system' contains responses from AI-driven chatbots.

    Once you understand what is included in the data set, it's time for you to start building your AI system! Depending on how complex or advanced your goal is, there are several different approaches that could be used when working with this data set such as supervised learning models like seq2seq network or unsupervised methods like autoencoders etc. To get more detailed information regarding those methods refer to external materials available online.

    After having trained your model, now it's time for testing out its performance! Enter some sample text into your model using either a web form or command line interface – then observe how it responds against what’s already stored within training datasets column ‘System’ which indicates expected chatsbot response (see above). You should find that once trained correctly; potential outcomes of such tests explores very closely resembling instances from learning sources (the training dataset) leading evidence of advanced Artificial intelligence applications are possible with sufficient analysis inputs! As always if extra accuracy is needed afterwards tweak any parameters until desired results are achieved - Congratulations!

    Research Ideas

    • AI-driven natural language generation: Using this dataset, developers can train AI systems to automatically generate natural conversations between humans and machines.
    • Automatic response selection: The data in the dataset could be used to train AI algorithms which select the most appropriate response in any given conversation.
    • Evaluating human-machine interaction: Researchers can use this data to identify areas of improvement in conversational interactions between humans and machines, as well as evaluate various techniques for creating effective dialogue systems

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | chat | Contains dialogues uttered by the human. (String) | | system | Contains responses from the AI-driven chatbot. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  7. bot-fight-data

    • dl.aifasthub.com
    • huggingface.co
    • +1more
    Updated Feb 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huggingface Projects (2025). bot-fight-data [Dataset]. https://dl.aifasthub.com/datasets/huggingface-projects/bot-fight-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 9, 2025
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Huggingface Projects
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    huggingface-projects/bot-fight-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    chatbot_instruction_prompts

    • huggingface.co
    • opendatalab.com
    Updated Mar 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandro Palla (2023). chatbot_instruction_prompts [Dataset]. https://huggingface.co/datasets/alespalla/chatbot_instruction_prompts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2023
    Authors
    Alessandro Palla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Chatbot Instruction Prompts Datasets

      Dataset Summary
    

    This dataset has been generated from the following ones:

    tatsu-lab/alpaca Dahoas/instruct-human-assistant-prompt allenai/prosocial-dialog

    The datasets has been cleaned up of spurious entries and artifacts. It contains ~500k of prompt and expected resposne. This DB is intended to train an instruct-type model

  9. h

    toxic-chat

    • huggingface.co
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Large Model Systems Organization (2024). toxic-chat [Dataset]. https://huggingface.co/datasets/lmsys/toxic-chat
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 25, 2024
    Dataset authored and provided by
    Large Model Systems Organization
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Update

    [01/31/2024] We update the OpenAI Moderation API results for ToxicChat (0124) based on their updated moderation model on on Jan 25, 2024.[01/28/2024] We release an official T5-Large model trained on ToxicChat (toxicchat0124). Go and check it for you baseline comparision![01/19/2024] We have a new version of ToxicChat (toxicchat0124)!

      Content
    

    This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna online demo. We utilize a human-AI… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/toxic-chat.

  10. h

    SPML_Chatbot_Prompt_Injection

    • huggingface.co
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reshabh K Sharma (2024). SPML_Chatbot_Prompt_Injection [Dataset]. https://huggingface.co/datasets/reshabhs/SPML_Chatbot_Prompt_Injection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Authors
    Reshabh K Sharma
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    SPML Chatbot Prompt Injection Dataset

    Arxiv Paper Introducing the SPML Chatbot Prompt Injection Dataset: a robust collection of system prompts designed to create realistic chatbot interactions, coupled with a diverse array of annotated user prompts that attempt to carry out prompt injection attacks. While other datasets in this domain have centered on less practical chatbot scenarios or have limited themselves to "jailbreaking" – just one aspect of prompt injection – our dataset… See the full description on the dataset page: https://huggingface.co/datasets/reshabhs/SPML_Chatbot_Prompt_Injection.

  11. h

    student-assistance-chatbot

    • huggingface.co
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Patel (2024). student-assistance-chatbot [Dataset]. https://huggingface.co/datasets/bot-remains/student-assistance-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2024
    Authors
    Harsh Patel
    Description

    bot-remains/student-assistance-chatbot dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    Bitext-retail-ecommerce-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-retail-ecommerce-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Retail (eCommerce) Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [Retail (eCommerce)] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-retail-ecommerce-llm-chatbot-training-dataset.

  13. h

    mental_health_conversational_dataset

    • huggingface.co
    Updated Aug 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahrizhal Ali (2023). mental_health_conversational_dataset [Dataset]. https://huggingface.co/datasets/ZahrizhalAli/mental_health_conversational_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 10, 2023
    Authors
    Zahrizhal Ali
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CREDIT: Dataset Card for "heliosbrahma/mental_health_chatbot_dataset"

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    This dataset contains conversational pair of questions and answers in a single text related to Mental Health. Dataset was curated from popular healthcare blogs like WebMD, Mayo Clinic and HeatlhLine, online FAQs etc. All questions and answers have been anonymized to remove any PII data and pre-processed to remove any unwanted characters.

      Languages… See the full description on the dataset page: https://huggingface.co/datasets/ZahrizhalAli/mental_health_conversational_dataset.
    
  14. h

    Chatbot-Dataset

    • huggingface.co
    Updated Aug 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manav (2024). Chatbot-Dataset [Dataset]. https://huggingface.co/datasets/Manav5461/Chatbot-Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2024
    Authors
    Manav
    Description

    Manav5461/Chatbot-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    chatbot-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Llamas, chatbot-dataset [Dataset]. https://huggingface.co/datasets/myLlama03/chatbot-dataset
    Explore at:
    Dataset authored and provided by
    Llamas
    Description

    myLlama03/chatbot-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    medical-chatbot

    • huggingface.co
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tejas Taneja (2024). medical-chatbot [Dataset]. https://huggingface.co/datasets/tejas1206/medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2024
    Authors
    Tejas Taneja
    Description

    tejas1206/medical-chatbot dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    chatbot-arena-elo

    • huggingface.co
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathew Huerta-Enochian (2025). chatbot-arena-elo [Dataset]. https://huggingface.co/datasets/mathewhe/chatbot-arena-elo
    Explore at:
    Dataset updated
    Mar 26, 2025
    Authors
    Mathew Huerta-Enochian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LMSYS Chatbot Arena ELO Scores

    This dataset is a datasets-friendly version of Chatbot Arena ELO scores, updated daily from the leaderboard API at https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard. Updated: 20250717

      Loading Data
    

    from datasets import load_dataset

    dataset = load_dataset("mathewhe/chatbot-arena-elo", split="train")

    The main branch of this dataset will always be updated to the latest ELO and leaderboard version. If you need a fixed dataset… See the full description on the dataset page: https://huggingface.co/datasets/mathewhe/chatbot-arena-elo.

  18. h

    chatbot

    • huggingface.co
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Silvia (2025). chatbot [Dataset]. https://huggingface.co/datasets/silviaiaia/chatbot
    Explore at:
    Dataset updated
    Apr 3, 2025
    Authors
    Silvia
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    silviaiaia/chatbot dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    ai-medical-chatbot

    • huggingface.co
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Gross (2024). ai-medical-chatbot [Dataset]. https://huggingface.co/datasets/DrBenjamin/ai-medical-chatbot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2024
    Authors
    Benjamin Gross
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    DrBenjamin/ai-medical-chatbot dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    Bitext-events-ticketing-llm-chatbot-training-dataset

    • huggingface.co
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2024). Bitext-events-ticketing-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-events-ticketing-llm-chatbot-training-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 6, 2024
    Dataset authored and provided by
    Bitext
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Bitext - Events and Ticketing Tagged Training Dataset for LLM-based Virtual Assistants

      Overview
    

    This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the [events and ticketing] sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-events-ticketing-llm-chatbot-training-dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Large Model Systems Organization (2023). chatbot_arena_conversations [Dataset]. https://huggingface.co/datasets/lmsys/chatbot_arena_conversations

chatbot_arena_conversations

lmsys/chatbot_arena_conversations

Explore at:
24 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 18, 2023
Dataset authored and provided by
Large Model Systems Organization
License

https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

Description

Chatbot Arena Conversations Dataset

This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the Chatbot Arena from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/chatbot_arena_conversations.

Search
Clear search
Close search
Google apps
Main menu