3 datasets found

h
Bitext-customer-support-llm-chatbot-training-dataset
huggingface.co
opendatalab.com
Updated Jul 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 16, 2024
Dataset authored and provided by
Bitext
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.
h
llm-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data, llm-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/llm-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
LLM Dataset - Prompts and Generated Texts

The dataset contains prompts and texts generated by the Large Language Models (LLMs) in 32 different languages. The prompts are short sentences or phrases for the model to generate text. The texts generated by the LLM are responses to these prompts and can vary in length and complexity. Researchers and developers can use this dataset to train and fine-tune their own language models for multilingual applications. The dataset provides a rich… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/llm-dataset.
h
llama-instruct
huggingface.co
Updated Aug 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Together (2023). llama-instruct [Dataset]. https://huggingface.co/datasets/togethercomputer/llama-instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2023
Dataset authored and provided by
Together
License
https://choosealicense.com/licenses/llama2/https://choosealicense.com/licenses/llama2/
Description
llama-instruct

This dataset was used to finetune Llama-2-7B-32K-Instruct. We follow the distillation paradigm that is used by Alpaca, Vicuna, WizardLM, Orca — producing instructions by querying a powerful LLM, which in our case, is the Llama-2-70B-Chat model released by Meta. To build Llama-2-7B-32K-Instruct, we collect instructions from 19K human inputs extracted from ShareGPT-90K (only using human inputs, not ChatGPT outputs). The actual script handles multi-turn conversations… See the full description on the dataset page: https://huggingface.co/datasets/togethercomputer/llama-instruct.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bitext (2024). Bitext-customer-support-llm-chatbot-training-dataset [Dataset]. https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset

Bitext-customer-support-llm-chatbot-training-dataset

bitext/Bitext-customer-support-llm-chatbot-training-dataset

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 16, 2024

Dataset authored and provided by

Bitext

License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

  Overview

This hybrid synthetic dataset is designed to be used to fine-tune Large Language Models such as GPT, Mistral and OpenELM, and has been generated using our NLP/NLG technology and our automated Data Labeling (DAL) tools. The goal is to demonstrate how Verticalization/Domain Adaptation for the Customer Support sector can be easily achieved using our two-step approach to LLM… See the full description on the dataset page: https://huggingface.co/datasets/bitext/Bitext-customer-support-llm-chatbot-training-dataset.

Clear search

Close search

Google apps

Main menu

Bitext-customer-support-llm-chatbot-training-dataset

llm-dataset

llama-instruct

Bitext-customer-support-llm-chatbot-training-datasetSee More Versions

bitext/Bitext-customer-support-llm-chatbot-training-dataset

Bitext - Customer Service Tagged Training Dataset for LLM-based Virtual Assistants

Bitext-customer-support-llm-chatbot-training-dataset