17 datasets found
  1. h

    openassistant-guanaco-llama2-format

    • huggingface.co
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giles Thomas (2024). openassistant-guanaco-llama2-format [Dataset]. https://huggingface.co/datasets/gpjt/openassistant-guanaco-llama2-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2024
    Authors
    Giles Thomas
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset is timdettmers/openassistant-guanaco converted to what I believe to be the Llama 2 prompt format (based on this Reddit post). It is otherwise unchanged.
    The format is like this: [INST] <

  2. h

    openassistant-guanaco-reformatted

    • huggingface.co
    Updated Jul 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). openassistant-guanaco-reformatted [Dataset]. https://huggingface.co/datasets/allenai/openassistant-guanaco-reformatted
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2024
    Dataset authored and provided by
    Ai2
    Description

    Standardized format from: https://huggingface.co/datasets/timdettmers/openassistant-guanaco?row=0 This dataset is a subset of the Open Assistant dataset, which you can find here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples. This dataset was used to train Guanaco with QLoRA. For further information, please see the original dataset. License: Apache 2.0

  3. h

    hellobot

    • huggingface.co
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siddharth Sharma (2024). hellobot [Dataset]. https://huggingface.co/datasets/sidd21sharma/hellobot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2024
    Authors
    Siddharth Sharma
    Description

    Guanaco: Lazy Llama 2 Formatting

    This is the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 model in a Google Colab.

  4. h

    guanaco-ai-filtered

    • huggingface.co
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Anderson (2023). guanaco-ai-filtered [Dataset]. https://huggingface.co/datasets/andersonbcdefg/guanaco-ai-filtered
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2023
    Authors
    Benjamin Anderson
    Description

    Dataset Card for "guanaco-ai-filtered"

    This dataset is a subset of TimDettmers/openassistant-guanaco useful for training generalist English-language chatbots. It has been filtered to a) remove conversations in languages other than English using a fasttext classifier, and b) remove conversations where Open Assistant is mentioned, as people training their own chatbots likely do not want their chatbot to think it is named OpenAssistant.

  5. h

    guanaco-llama2-1k

    • huggingface.co
    • opendatalab.com
    Updated Mar 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2024). guanaco-llama2-1k [Dataset]. https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2024
    Authors
    Maxime Labonne
    Description

    Guanaco-1k: Lazy Llama 2 Formatting

    This is a subset (1000 samples) of the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. It was created using the following colab notebook. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 (chat) model in a Google Colab.

  6. h

    openassistant-guanaco-chinese

    • huggingface.co
    Updated Jul 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiang Elliot (2023). openassistant-guanaco-chinese [Dataset]. https://huggingface.co/datasets/Elliot4AI/openassistant-guanaco-chinese
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2023
    Authors
    Jiang Elliot
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary

    🏡🏡🏡🏡Fine-turn Dataset:中文数据集🏡🏡🏡🏡 😀😀😀😀😀😀😀😀 这个数据集是timdettmers/openassistant-guanaco的中文版本,是直接翻译过来,没有经过人为检查语法。 对timdettmers/openassistant-guanaco的描述,请看他的dataset card。 License: Apache 2.0 😀😀😀😀😀😀😀😀 This data set is the Chinese version of timdettmers/openassistant-guanaco, which is directly translated without human-checked grammar. For a description of timdettmers/openassistant-guanaco, see its dataset card. License: Apache 2.0

  7. h

    guanaco-spanish-dataset

    • huggingface.co
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Héctor López Hidalgo (2023). guanaco-spanish-dataset [Dataset]. https://huggingface.co/datasets/hlhdatscience/guanaco-spanish-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2023
    Authors
    Héctor López Hidalgo
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "guanaco-spanish-dataset"

    CLEANING AND CURATION OF THE DATASET HAS BEEN PERFORMED. NOW IT IS FULLY IN SPANISH (Date:12/01/2024) This dataset is a subset of original timdettmers/openassistant-guanaco,which is also a subset o/f the Open Assistant dataset .You can find here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main/ This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 2,369 samples, translated… See the full description on the dataset page: https://huggingface.co/datasets/hlhdatscience/guanaco-spanish-dataset.

  8. h

    oasst1

    • huggingface.co
    • paperswithcode.com
    Updated Apr 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAssistant (2023). oasst1 [Dataset]. https://huggingface.co/datasets/OpenAssistant/oasst1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2023
    Dataset authored and provided by
    OpenAssistant
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    OpenAssistant Conversations Dataset (OASST1)

      Dataset Summary
    

    In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations (OASST1), a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort… See the full description on the dataset page: https://huggingface.co/datasets/OpenAssistant/oasst1.

  9. h

    guanaco-llama2-3k

    • huggingface.co
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Othman (2024). guanaco-llama2-3k [Dataset]. https://huggingface.co/datasets/MohammadOthman/guanaco-llama2-3k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2024
    Authors
    Mohammad Othman
    Description

    This is a derived collection of 3000 samples from the recognized timdettmers/openassistant-guanaco dataset, tailored to align with the prompt structure required by Llama 2.

  10. h

    guanaco-llama2-2k

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lionel Cheng (2023). guanaco-llama2-2k [Dataset]. https://huggingface.co/datasets/lionelchg/guanaco-llama2-2k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Lionel Cheng
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card

    This is an 2000 examples extract of https://huggingface.co/datasets/timdettmers/openassistant-guanaco

  11. h

    guanaco-pt

    • huggingface.co
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan (2023). guanaco-pt [Dataset]. http://doi.org/10.57967/hf/0734
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2023
    Authors
    Alan
    Description

    Open Assistant Guanaco traduzido para o Português

    Prompts originalmente em português não foram processados Traduzido com a API do Chat GPT 3.5-turbo Prompts de baixa qualidade identificados foram removidos do dataset

      Sobre o Guanaco Original
    

    O Guanaco (https://huggingface.co/datasets/timdettmers/openassistant-guanaco) é um conjunto de dados que faz parte do conjunto de dados Open Assistant, que pode ser encontrado aqui:… See the full description on the dataset page: https://huggingface.co/datasets/ocordeiro/guanaco-pt.

  12. h

    LLM_Test_dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tc lin, LLM_Test_dataset [Dataset]. https://huggingface.co/datasets/stuser2023/LLM_Test_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    tc lin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    reference from: huggingface: timdettmers/openassistant-guanaco https://huggingface.co/datasets/timdettmers/openassistant-guanaco

  13. h

    guanaco-llama2

    • huggingface.co
    Updated Apr 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Padma Madhukar Dhakappa (2024). guanaco-llama2 [Dataset]. https://huggingface.co/datasets/PadmaDhakappa/guanaco-llama2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2024
    Authors
    Padma Madhukar Dhakappa
    Description

    This is the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 model in a Google Colab.

  14. h

    guanaco-llama2-1k

    • huggingface.co
    Updated Feb 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenqi Glantz (2024). guanaco-llama2-1k [Dataset]. https://huggingface.co/datasets/wenqiglantz/guanaco-llama2-1k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2024
    Authors
    Wenqi Glantz
    Description

    This is a subset (1000 samples) of timdettmers/openassistant-guanaco dataset, processed to match Mistral-7B-instruct-v0.2's prompt format as described in this article. It was created using the colab notebook. Inspired by Maxime Labonne's llm-course repo.

  15. h

    Arabic_guanaco_oasst1

    • huggingface.co
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali El Filali (2023). Arabic_guanaco_oasst1 [Dataset]. https://huggingface.co/datasets/alielfilali01/Arabic_guanaco_oasst1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2023
    Authors
    Ali El Filali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for "Arabic_guanaco_oasst1"

    This dataset is the openassistant-guanaco dataset a subset of the Open Assistant dataset translated to Arabic. You can find the original dataset here: https://huggingface.co/datasets/timdettmers/openassistant-guanaco Or the main dataset here: https://huggingface.co/datasets/OpenAssistant/oasst1/tree/main This subset of the data only contains the highest-rated paths in the conversation tree, with a total of 9,846 samples. For further… See the full description on the dataset page: https://huggingface.co/datasets/alielfilali01/Arabic_guanaco_oasst1.

  16. h

    guanaco-llama2-10k

    • huggingface.co
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chenshake (2024). guanaco-llama2-10k [Dataset]. https://huggingface.co/datasets/chenshake/guanaco-llama2-10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2024
    Authors
    chenshake
    Description

    我是使用notebook练习进行dataset的转换。 如何训练llama-2,是不需要数据转换,直接使用timdettmers/openassistant-guanaco 就可以。 如果是llama-2-chat版本,需要做数据格式转换。转换过程,参考 notebook。 Llama2-chat-dataset 转换

  17. h

    iruca_llama2_japanese_demo

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XinqiYang, iruca_llama2_japanese_demo [Dataset]. https://huggingface.co/datasets/xinqiyang/iruca_llama2_japanese_demo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    XinqiYang
    Description

    iruca-1k: Lazy Llama 2 Formatting

    This is a subset (1000 samples) of the excellent timdettmers/openassistant-guanaco dataset, processed to match Llama 2's prompt format as described in this article. It was created using the following colab notebook. Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for this article about fine-tuning a Llama 2 (chat) model in a Google Colab.

      Format from xlsx file to CSV
    

    pip install openpyxl pandas… See the full description on the dataset page: https://huggingface.co/datasets/xinqiyang/iruca_llama2_japanese_demo.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Giles Thomas (2024). openassistant-guanaco-llama2-format [Dataset]. https://huggingface.co/datasets/gpjt/openassistant-guanaco-llama2-format

openassistant-guanaco-llama2-format

gpjt/openassistant-guanaco-llama2-format

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2024
Authors
Giles Thomas
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset is timdettmers/openassistant-guanaco converted to what I believe to be the Llama 2 prompt format (based on this Reddit post). It is otherwise unchanged.
The format is like this: [INST] <

Search
Clear search
Close search
Google apps
Main menu