17 datasets found

h
openassistant-guanaco-llama2-format
huggingface.co
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giles Thomas (2024). openassistant-guanaco-llama2-format [Dataset]. https://huggingface.co/datasets/gpjt/openassistant-guanaco-llama2-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2024
Authors
Giles Thomas
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset is timdettmers/openassistant-guanaco converted to what I believe to be the Llama 2 prompt format (based on this Reddit post). It is otherwise unchanged.
The format is like this: [INST] <
h
openassistant-guanaco-reformatted
huggingface.co
Updated Jul 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). openassistant-guanaco-reformatted [Dataset]. https://huggingface.co/datasets/allenai/openassistant-guanaco-reformatted
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2024
Dataset authored and provided by
Ai2
Description
Standardized format from: https://huggingface.co/datasets/timdettmers/openassistant-guanaco?row=0 This dataset is a subset of the Open Assistant dataset, which you can find here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples. This dataset was used to train Guanaco with QLoRA. For further information, please see the original dataset. License: Apache 2.0
h
hellobot
huggingface.co
Updated Apr 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siddharth Sharma (2024). hellobot [Dataset]. https://huggingface.co/datasets/sidd21sharma/hellobot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2024
Authors
Siddharth Sharma
Description
Guanaco: Lazy Llama 2 Formatting

This is the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 model in a Google Colab.
h
guanaco-ai-filtered
huggingface.co
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Anderson (2023). guanaco-ai-filtered [Dataset]. https://huggingface.co/datasets/andersonbcdefg/guanaco-ai-filtered
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2023
Authors
Benjamin Anderson
Description
Dataset Card for "guanaco-ai-filtered"

This dataset is a subset of TimDettmers/openassistant-guanaco useful for training generalist English-language chatbots. It has been filtered to a) remove conversations in languages other than English using a fasttext classifier, and b) remove conversations where Open Assistant is mentioned, as people training their own chatbots likely do not want their chatbot to think it is named OpenAssistant.
h
guanaco-llama2-1k
huggingface.co
opendatalab.com
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2024). guanaco-llama2-1k [Dataset]. https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 4, 2024
Authors
Maxime Labonne
Description
Guanaco-1k: Lazy Llama 2 Formatting

This is a subset (1000 samples) of the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. It was created using the following colab notebook. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 (chat) model in a Google Colab.
h
openassistant-guanaco-chinese
huggingface.co
Updated Jul 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiang Elliot (2023). openassistant-guanaco-chinese [Dataset]. https://huggingface.co/datasets/Elliot4AI/openassistant-guanaco-chinese
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2023
Authors
Jiang Elliot
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Summary

🏡🏡🏡🏡Fine-turn Dataset:中文数据集🏡🏡🏡🏡 😀😀😀😀😀😀😀😀 这个数据集是timdettmers/openassistant-guanaco的中文版本，是直接翻译过来，没有经过人为检查语法。对timdettmers/openassistant-guanaco的描述，请看他的dataset card。 License: Apache 2.0 😀😀😀😀😀😀😀😀 This data set is the Chinese version of timdettmers/openassistant-guanaco, which is directly translated without human-checked grammar. For a description of timdettmers/openassistant-guanaco, see its dataset card. License: Apache 2.0
h
guanaco-spanish-dataset
huggingface.co
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Héctor López Hidalgo (2023). guanaco-spanish-dataset [Dataset]. https://huggingface.co/datasets/hlhdatscience/guanaco-spanish-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Authors
Héctor López Hidalgo
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "guanaco-spanish-dataset"

CLEANING AND CURATION OF THE DATASET HAS BEEN PERFORMED. NOW IT IS FULLY IN SPANISH (Date:12/01/2024) This dataset is a subset of original timdettmers/openassistant-guanaco,which is also a subset o/f the Open Assistant dataset .You can find here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main/ This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 2,369 samples, translated… See the full description on the dataset page: https://huggingface.co/datasets/hlhdatscience/guanaco-spanish-dataset.
h
oasst1
huggingface.co
paperswithcode.com
Updated Apr 12, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAssistant (2023). oasst1 [Dataset]. https://huggingface.co/datasets/OpenAssistant/oasst1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2023
Dataset authored and provided by
OpenAssistant
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
OpenAssistant Conversations Dataset (OASST1)

Dataset Summary

In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations (OASST1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort… See the full description on the dataset page: https://huggingface.co/datasets/OpenAssistant/oasst1.
h
guanaco-llama2-3k
huggingface.co
Updated Feb 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Othman (2024). guanaco-llama2-3k [Dataset]. https://huggingface.co/datasets/MohammadOthman/guanaco-llama2-3k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 8, 2024
Authors
Mohammad Othman
Description
This is a derived collection of 3000 samples from the recognized timdettmers/openassistant-guanaco dataset, tailored to align with the prompt structure required by Llama 2.
h
guanaco-llama2-2k
huggingface.co
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lionel Cheng (2023). guanaco-llama2-2k [Dataset]. https://huggingface.co/datasets/lionelchg/guanaco-llama2-2k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2023
Authors
Lionel Cheng
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card

This is an 2000 examples extract of https://huggingface.co/datasets/timdettmers/openassistant-guanaco
h
guanaco-pt
huggingface.co
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan (2023). guanaco-pt [Dataset]. http://doi.org/10.57967/hf/0734
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/0734
Dataset updated
Jun 5, 2023
Authors
Alan
Description
Open Assistant Guanaco traduzido para o Português

Prompts originalmente em português não foram processados Traduzido com a API do Chat GPT 3.5-turbo Prompts de baixa qualidade identificados foram removidos do dataset

Sobre o Guanaco Original

O Guanaco (https://huggingface.co/datasets/timdettmers/openassistant-guanaco) é um conjunto de dados que faz parte do conjunto de dados Open Assistant, que pode ser encontrado aqui:… See the full description on the dataset page: https://huggingface.co/datasets/ocordeiro/guanaco-pt.
h
LLM_Test_dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tc lin, LLM_Test_dataset [Dataset]. https://huggingface.co/datasets/stuser2023/LLM_Test_dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
tc lin
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
reference from: huggingface: timdettmers/openassistant-guanaco https://huggingface.co/datasets/timdettmers/openassistant-guanaco
h
guanaco-llama2
huggingface.co
Updated Apr 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Padma Madhukar Dhakappa (2024). guanaco-llama2 [Dataset]. https://huggingface.co/datasets/PadmaDhakappa/guanaco-llama2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2024
Authors
Padma Madhukar Dhakappa
Description
This is the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 model in a Google Colab.
h
guanaco-llama2-1k
huggingface.co
Updated Feb 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenqi Glantz (2024). guanaco-llama2-1k [Dataset]. https://huggingface.co/datasets/wenqiglantz/guanaco-llama2-1k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 1, 2024
Authors
Wenqi Glantz
Description
This is a subset (1000 samples) of timdettmers/openassistant-guanaco dataset, processed to match Mistral-7B-instruct-v0.2's prompt format as described in this article. It was created using the colab notebook. Inspired by Maxime Labonne's llm-course repo.
h
Arabic_guanaco_oasst1
huggingface.co
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali El Filali (2023). Arabic_guanaco_oasst1 [Dataset]. https://huggingface.co/datasets/alielfilali01/Arabic_guanaco_oasst1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2023
Authors
Ali El Filali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for "Arabic_guanaco_oasst1"

This dataset is the openassistant-guanaco dataset a subset of the Open Assistant dataset translated to Arabic. You can find the original dataset here: https://huggingface.co/datasets/timdettmers/openassistant-guanaco Or the main dataset here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples. For further… See the full description on the dataset page: https://huggingface.co/datasets/alielfilali01/Arabic_guanaco_oasst1.
h
guanaco-llama2-10k
huggingface.co
Updated Mar 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chenshake (2024). guanaco-llama2-10k [Dataset]. https://huggingface.co/datasets/chenshake/guanaco-llama2-10k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2024
Authors
chenshake
Description
我是使用notebook练习进行dataset的转换。如何训练llama-2，是不需要数据转换，直接使用timdettmers/openassistant-guanaco 就可以。如果是llama-2-chat版本，需要做数据格式转换。转换过程，参考 notebook。 Llama2-chat-dataset 转换
h
iruca_llama2_japanese_demo
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XinqiYang, iruca_llama2_japanese_demo [Dataset]. https://huggingface.co/datasets/xinqiyang/iruca_llama2_japanese_demo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
XinqiYang
Description
iruca-1k: Lazy Llama 2 Formatting

This is a subset (1000 samples) of the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. It was created using the following colab notebook. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 (chat) model in a Google Colab.

Format from xlsx file to CSV

pip install openpyxl pandas… See the full description on the dataset page: https://huggingface.co/datasets/xinqiyang/iruca_llama2_japanese_demo.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Giles Thomas (2024). openassistant-guanaco-llama2-format [Dataset]. https://huggingface.co/datasets/gpjt/openassistant-guanaco-llama2-format

openassistant-guanaco-llama2-format

gpjt/openassistant-guanaco-llama2-format

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 4, 2024

Authors

Giles Thomas

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset is timdettmers/openassistant-guanaco converted to what I believe to be the Llama 2 prompt format (based on this Reddit post). It is otherwise unchanged.
The format is like this: [INST] <

Clear search

Close search

Google apps

Main menu

openassistant-guanaco-llama2-format

openassistant-guanaco-reformatted

hellobot

guanaco-ai-filtered

guanaco-llama2-1k

openassistant-guanaco-chinese

guanaco-spanish-dataset

oasst1

guanaco-llama2-3k

guanaco-llama2-2k

guanaco-pt

LLM_Test_dataset

guanaco-llama2

guanaco-llama2-1k

Arabic_guanaco_oasst1

guanaco-llama2-10k

iruca_llama2_japanese_demo

openassistant-guanaco-llama2-format

gpjt/openassistant-guanaco-llama2-format