44 datasets found
  1. h

    alpaca

    • huggingface.co
    • opendatalab.com
    Updated Mar 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2023
    Dataset authored and provided by
    Tatsu Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca

      Dataset Summary
    

    Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

    The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

  2. h

    VIS

    • huggingface.co
    Updated Aug 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shiweiwei (2025). VIS [Dataset]. https://huggingface.co/datasets/weiwei888/VIS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2025
    Authors
    shiweiwei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca-Cleaned

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

    "instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/weiwei888/VIS.

  3. h

    stanford-alpaca-cleaned-turkish-translated

    • huggingface.co
    Updated Sep 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davut Emre TASAR (2023). stanford-alpaca-cleaned-turkish-translated [Dataset]. https://huggingface.co/datasets/emre/stanford-alpaca-cleaned-turkish-translated
    Explore at:
    Dataset updated
    Sep 4, 2023
    Authors
    Davut Emre TASAR
    License

    https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/

    Description

    09/04/2023 Update: New instructions added from: https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM Original Version: https://github.com/tatsu-lab/stanford_alpaca#data-release AI BASED TRANSLATION RESULTS OF STANFORD ALPACA EN TO TR For academic only, please cite before you use it. Taşar, D. E. T. (2023). stanford-alpaca-cleaned-turkish-translated [Dataset]. In Stanford Alpaca TR (1.0.1.a). https://huggingface.co/datasets/emre/stanford-alpaca-cleaned-turkish-translated… See the full description on the dataset page: https://huggingface.co/datasets/emre/stanford-alpaca-cleaned-turkish-translated.

  4. O

    Alpaca-COT

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University (2023). Alpaca-COT [Dataset]. https://opendatalab.com/OpenDataLab/Alpaca-COT
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2023
    Dataset provided by
    Stanford University
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LLaMA is a great work that demonstrates the amazing zero-shot and few-shot ability. It significantly reduces the cost of training, finetuning, and using competitive large language models, i.e., LLaMA-13B outperforms GPT-3(175B) and LLaMA-65B is competitive to PaLM-540B. Recently, to boost the instruction-following ability of LLaMA, Stanford Alpaca finetuned LLaMA-7B on 52K instruction-following data generated by the Self-Instruct techniques. However, at present, the LLM research community still faces three challenges: 1. Even LLaMA-7b still has high requirements for computing resources; 2. There are few open source datasets for instruction finetuning; and 3. There is a lack of empirical study on the impact of various types of instruction on model abilities, such as the ability to respond to Chinese instruction and the CoT reasoning.

  5. h

    thai_alpaca

    • huggingface.co
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SEACrowd (2024). thai_alpaca [Dataset]. https://huggingface.co/datasets/SEACrowd/thai_alpaca
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset authored and provided by
    SEACrowd
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is a Thai 🇹🇭-instructed dataset translated from cleaned version of the original Alpaca Dataset released by Stanford using Google Cloud Translation, contain 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better.

  6. h

    roman-urdu-alpaca-qa-mix

    • huggingface.co
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Salaar (2023). roman-urdu-alpaca-qa-mix [Dataset]. https://huggingface.co/datasets/Redgerd/roman-urdu-alpaca-qa-mix
    Explore at:
    Dataset updated
    Oct 23, 2023
    Authors
    Muhammad Salaar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Roman Urdu + Alpaca QA Mix

    This dataset is intended to support fine-tuning and evaluation of language models that understand and respond to Roman Urdu and English instructions. It consists of 1,022 records in total:

    500 examples in Roman Urdu generated from high-quality Urdu sources and transliterated using the ChatGPT API. 500 examples in English randomly sampled from the Stanford Alpaca dataset.

    The dataset follows the same format as Alpaca-style instruction… See the full description on the dataset page: https://huggingface.co/datasets/Redgerd/roman-urdu-alpaca-qa-mix.

  7. h

    vi-alpaca-input-output-format

    • huggingface.co
    Updated Apr 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BKAI-HUST Foundation Models Lab (2025). vi-alpaca-input-output-format [Dataset]. https://huggingface.co/datasets/bkai-foundation-models/vi-alpaca-input-output-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset authored and provided by
    BKAI-HUST Foundation Models Lab
    Description

    🇻🇳 Vietnamese modified Alpaca Dataset

    This dataset is especially designed for Vietnamese based on the idea from Stanford Alpaca, Self-Instruct paper and Chinese LLaMA. The motivation behind the creation of this dataset stems from the hope to contribute high-quality dataset to Vietnamese commnunity to train language models. To construct this dataset, we follow a two-step process:

    Step 1: Manually create Vietnamese seed tasks We employ the methodology outlined in the Self-Instruct… See the full description on the dataset page: https://huggingface.co/datasets/bkai-foundation-models/vi-alpaca-input-output-format.

  8. O

    InstructWild

    • opendatalab.com
    zip
    Updated Aug 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford University (2024). InstructWild [Dataset]. https://opendatalab.com/OpenDataLab/InstructWild
    Explore at:
    zip(15302878 bytes)Available download formats
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    Stanford University
    License

    https://github.com/XueFuzhao/InstructionWild/blob/main/LICENSEhttps://github.com/XueFuzhao/InstructionWild/blob/main/LICENSE

    Description

    Instruction Tuning is a key component of ChatGPT. OpenAI uses their user-based instruction dataset, but unfortunately, this dataset is not open source. Self-Instruct released a small instruction dataset consisting of 175 human-written instructions. The Stanford Alpaca team text-davinci-003 generated 52K instructions by model from the above 175 seed instructions.

    The project's goal is a larger and more diverse instruction dataset. To this end, we collected 429 descriptions from ChatGPT usage screenshots and released Chinese and English versions. We found that these instructions are very diverse, even if the scale is still small. We follow Alpaca to generate 52K commands and their responses. All data can be found in the directory data.

    NOTE: This is an ongoing project. We are still collecting and improving our data. We release this dataset early to accelerate our LLM research. We will also publish a white paper soon.

  9. h

    alpaca_hu_2k

    • huggingface.co
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hungarian Research Centre for Linguistics (2024). alpaca_hu_2k [Dataset]. https://huggingface.co/datasets/NYTK/alpaca_hu_2k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 12, 2024
    Dataset authored and provided by
    Hungarian Research Centre for Linguistics
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca-Hu-2k

    This is the dataset card for the Hungarian translation of a subset of the Stanford Alpaca prompts.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    The dataset is the first Hungarian language instruction-following corpus created for fine-tuning large language models, specifically developed by translating and localizing a portion of the Stanford Alpaca corpus. It contains 2000 translated and 100 localized prompts, designed to train… See the full description on the dataset page: https://huggingface.co/datasets/NYTK/alpaca_hu_2k.

  10. h

    mix-instruct

    • huggingface.co
    • opendatalab.com
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LLM Blender (2024). mix-instruct [Dataset]. https://huggingface.co/datasets/llm-blender/mix-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2024
    Dataset authored and provided by
    LLM Blender
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MixInstruct

      Introduction
    

    This is the official realease of dataset MixInstruct for project LLM-Blender. This dataset contains 11 responses from the current popular instruction following-LLMs that includes:

    Stanford Alpaca FastChat Vicuna Dolly V2 StableLM Open Assistant Koala Baize Flan-T5 ChatGLM MOSS Moasic MPT

    We evaluate each response with auto metrics including BLEU, ROUGE, BERTScore, BARTScore. And provide pairwise comparison results by prompting ChatGPT for the… See the full description on the dataset page: https://huggingface.co/datasets/llm-blender/mix-instruct.

  11. h

    Alpaca-pubmed-summarization

    • huggingface.co
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhongsheng Wang (2023). Alpaca-pubmed-summarization [Dataset]. https://huggingface.co/datasets/ZhongshengWang/Alpaca-pubmed-summarization
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 16, 2023
    Authors
    Zhongsheng Wang
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    This data set is a lightweight fine-tuned data format version of the Llama2 large language model for Stanford Alpaca. You can click here to view. cite original code @inproceedings{cohan-etal-2018-discourse, title = "A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents", author = "Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli", booktitle = "Proceedings… See the full description on the dataset page: https://huggingface.co/datasets/ZhongshengWang/Alpaca-pubmed-summarization.

  12. h

    ChatAlpaca-20K

    • huggingface.co
    Updated Jan 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin Smits (2024). ChatAlpaca-20K [Dataset]. https://huggingface.co/datasets/robinsmits/ChatAlpaca-20K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 6, 2024
    Authors
    Robin Smits
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for ChatAlpaca 20K

      ChatAlpaca: A Multi-Turn Dialogue Corpus based on Alpaca Instructions
    
    
    
    
    
      Dataset Description
    

    ChatAlpaca is a chat dataset that aims to help researchers develop models for instruction-following in multi-turn conversations. The dataset is an extension of the Stanford Alpaca data, which contains multi-turn instructions and their corresponding responses. ChatAlpaca is developed by Chinese Information Processing Laboratory at the… See the full description on the dataset page: https://huggingface.co/datasets/robinsmits/ChatAlpaca-20K.

  13. h

    ru_turbo_alpaca

    • huggingface.co
    • opendatalab.com
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilya Gusev (2024). ru_turbo_alpaca [Dataset]. https://huggingface.co/datasets/IlyaGusev/ru_turbo_alpaca
    Explore at:
    Dataset updated
    Oct 28, 2024
    Authors
    Ilya Gusev
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RuTurboAlpaca

    Dataset of ChatGPT-generated instructions in Russian.

    Code: rulm/self_instruct Code is based on Stanford Alpaca and self-instruct. 29822 examples

    Preliminary evaluation by an expert based on 400 samples:

    83% of samples contain correct instructions 63% of samples have correct instructions and outputs

    Crowdsouring-based evaluation on 3500 samples:

    90% of samples contain correct instructions 68% of samples have correct instructions and outputs

    Prompt template:… See the full description on the dataset page: https://huggingface.co/datasets/IlyaGusev/ru_turbo_alpaca.

  14. h

    mavis

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mike, mavis [Dataset]. https://huggingface.co/datasets/mikemoe/mavis
    Explore at:
    Authors
    mike
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca-Cleaned

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

    "instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/mikemoe/mavis.

  15. h

    alpaca-id-cleaned

    • huggingface.co
    Updated Apr 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cahya Wirawan (2023). alpaca-id-cleaned [Dataset]. https://huggingface.co/datasets/cahya/alpaca-id-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2023
    Authors
    Cahya Wirawan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Indonesian Alpaca-Cleaned

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is the Indonesian translated version of the cleaned original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an… See the full description on the dataset page: https://huggingface.co/datasets/cahya/alpaca-id-cleaned.

  16. h

    alpaca-data-pt-br

    • huggingface.co
    Updated Apr 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maicon Domingues (2023). alpaca-data-pt-br [Dataset]. https://huggingface.co/datasets/dominguesm/alpaca-data-pt-br
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2023
    Authors
    Maicon Domingues
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    NOTE: This is a machine translated version of the yahma/alpaca-cleaned dataset.

      Dataset Card for Alpaca-Cleaned
    

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet… See the full description on the dataset page: https://huggingface.co/datasets/dominguesm/alpaca-data-pt-br.

  17. h

    alpaca-spanish

    • huggingface.co
    Updated Apr 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BERTIN Project (2023). alpaca-spanish [Dataset]. https://huggingface.co/datasets/bertin-project/alpaca-spanish
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2023
    Dataset authored and provided by
    BERTIN Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BERTIN Alpaca Spanish

    This dataset is a translation to Spanish of alpaca_data_cleaned.json, a clean version of the Alpaca dataset made at Stanford. An earlier version used Facebook's NLLB 1.3B model, but the current version uses OpenAI's gpt-3.5-turbo, hence this dataset cannot be used to create models that compete in any way against OpenAI.

  18. h

    prodigy-cleaned

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aadhithya, prodigy-cleaned [Dataset]. https://huggingface.co/datasets/Apex-X/prodigy-cleaned
    Explore at:
    Authors
    Aadhithya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca-Cleaned

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

    "instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/Apex-X/prodigy-cleaned.

  19. h

    alpaca-cleaned-uz

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Behbudiy Labs, alpaca-cleaned-uz [Dataset]. https://huggingface.co/datasets/behbudiy/alpaca-cleaned-uz
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Behbudiy Labs
    Description

    Dataset Summary

    This dataset is a translation of the yahma/alpaca-cleaned dataset into Uzbek, leveraging the Google Translate API. The original dataset is a cleaned version of the Stanford Alpaca dataset, which contains instruction-following data for fine-tuning large language models. The cleaned version improves upon the original Alpaca dataset by removing low-quality data and inconsistencies in formatting, which helps enhance the quality and robustness of models trained on it.

  20. h

    kyrgyz-alpaca

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cramer Project (2024). kyrgyz-alpaca [Dataset]. https://huggingface.co/datasets/the-cramer-project/kyrgyz-alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Dataset authored and provided by
    The Cramer Project
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Kyrgyz Alpaca

    This repo is made for research use only, i.e., cannot be used for commercial purposes or entertainment.

      References
    

    All of our achievements were made achievable thanks to the robust AI community in Kyrgyzstan and the contributions made by individuals within the AkylAI project (by TheCramer.com). We also express our gratitude to Stanford for their outstanding efforts and extend the accessibility of this dataset to a global audience.

      Dataset
    

    Kyrgyz… See the full description on the dataset page: https://huggingface.co/datasets/the-cramer-project/kyrgyz-alpaca.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca

alpaca

Alpaca

tatsu-lab/alpaca

Explore at:
74 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Dataset authored and provided by
Tatsu Lab
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Dataset Card for Alpaca

  Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

Search
Clear search
Close search
Google apps
Main menu