100+ datasets found
  1. h

    alpaca

    • huggingface.co
    • opendatalab.com
    Updated Mar 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2023
    Dataset authored and provided by
    Tatsu Lab
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca

      Dataset Summary
    

    Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

    The text-davinci-003 engine to generate the instruction data insteadโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

  2. h

    synth-ehr-icd10-alpaca-format

    • huggingface.co
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Generative Technologies, Inc (2024). synth-ehr-icd10-alpaca-format [Dataset]. https://huggingface.co/datasets/generative-technologies/synth-ehr-icd10-alpaca-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2024
    Dataset authored and provided by
    Generative Technologies, Inc
    Description

    generative-technologies/synth-ehr-icd10-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    sft-ready-Text-Generation-Augmented-Data-Alpaca-Format

    • huggingface.co
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Janati (2024). sft-ready-Text-Generation-Augmented-Data-Alpaca-Format [Dataset]. https://huggingface.co/datasets/Na0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Authors
    Ali Janati
    Description

    Na0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    alpaca-cleaned

    • huggingface.co
    Updated Apr 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gene Ruebsamen (2023). alpaca-cleaned [Dataset]. https://huggingface.co/datasets/yahma/alpaca-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2023
    Authors
    Gene Ruebsamen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Alpaca-Cleaned

    Repository: https://github.com/gururise/AlpacaDataCleaned

      Dataset Description
    

    This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

    Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

    "instruction":"Summarize theโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/yahma/alpaca-cleaned.

  5. h

    vi-alpaca-input-output-format

    • huggingface.co
    Updated Apr 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BKAI-HUST Foundation Models Lab (2025). vi-alpaca-input-output-format [Dataset]. https://huggingface.co/datasets/bkai-foundation-models/vi-alpaca-input-output-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset authored and provided by
    BKAI-HUST Foundation Models Lab
    Description

    ๐Ÿ‡ป๐Ÿ‡ณ Vietnamese modified Alpaca Dataset

    This dataset is especially designed for Vietnamese based on the idea from Stanford Alpaca, Self-Instruct paper and Chinese LLaMA. The motivation behind the creation of this dataset stems from the hope to contribute high-quality dataset to Vietnamese commnunity to train language models. To construct this dataset, we follow a two-step process:

    Step 1: Manually create Vietnamese seed tasks We employ the methodology outlined in the Self-Instructโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/bkai-foundation-models/vi-alpaca-input-output-format.

  6. h

    limo-trial7-alpaca-format

    • huggingface.co
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language & AGI Lab (2025). limo-trial7-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/limo-trial7-alpaca-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2025
    Dataset authored and provided by
    Language & AGI Lab
    Description

    LangAGI-Lab/limo-trial7-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    ffmperative-alpaca-format-50k

    • huggingface.co
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Remyx AI (2024). ffmperative-alpaca-format-50k [Dataset]. https://huggingface.co/datasets/remyxai/ffmperative-alpaca-format-50k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    Remyx AI
    Description

    remyxai/ffmperative-alpaca-format-50k dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    mental-alpaca-format

    • huggingface.co
    Updated Apr 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    boris (2019). mental-alpaca-format [Dataset]. https://huggingface.co/datasets/usham/mental-alpaca-format
    Explore at:
    Dataset updated
    Apr 18, 2019
    Authors
    boris
    Description

    usham/mental-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    alpaca-qa-data

    • huggingface.co
    Updated May 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amitk17 (2025). alpaca-qa-data [Dataset]. https://huggingface.co/datasets/sweatSmile/alpaca-qa-data
    Explore at:
    Dataset updated
    May 21, 2025
    Authors
    amitk17
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Alpaca-style Question and Answer Dataset

    This dataset contains question-answer pairs formatted in the Alpaca instruction style, suitable for instruction fine-tuning of language models.

      Format
    

    Each example contains:

    instruction: The question input: Empty string (can be used for context in other applications) output: The answer text: The formatted text using the Alpaca template

      Template
    

    Below is an instruction that describes a task, paired with an input thatโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/sweatSmile/alpaca-qa-data.

  10. h

    limo-small-alpaca-format

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language & AGI Lab, limo-small-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/limo-small-alpaca-format
    Explore at:
    Dataset authored and provided by
    Language & AGI Lab
    Description

    LangAGI-Lab/limo-small-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j

    • huggingface.co
    Updated Mar 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLP Cloud (2023). instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j [Dataset]. https://huggingface.co/datasets/nlpcloud/instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2023
    Dataset authored and provided by
    NLP Cloud
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    This dataset is an adaptation of the Stanford Alpaca dataset in order to turn a text generation model like GPT-J into an "instruct" model. The initial dataset was slightly reworked in order to match the GPT-J fine-tuning format with Mesh Transformer Jax on TPUs.

  12. h

    retail-alpaca-format

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S Aditya, retail-alpaca-format [Dataset]. https://huggingface.co/datasets/aditya3w3733/retail-alpaca-format
    Explore at:
    Authors
    S Aditya
    Description

    aditya3w3733/retail-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    qwen-7b-instruct-8k-rft-alpaca-format

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language & AGI Lab, qwen-7b-instruct-8k-rft-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/qwen-7b-instruct-8k-rft-alpaca-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Language & AGI Lab
    Description

    LangAGI-Lab/qwen-7b-instruct-8k-rft-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases...

    • huggingface.co
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam (2024). 100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases [Dataset]. https://huggingface.co/datasets/adambuttrick/100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 9, 2024
    Authors
    Adam
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    adambuttrick/100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    MMLU-Alpaca

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RS (2024). MMLU-Alpaca [Dataset]. https://huggingface.co/datasets/HappyAIUser/MMLU-Alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Authors
    RS
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for MMLU-Alpaca

    This dataset contains instruction-input-output pairs converted to ShareGPT format, designed for instruction tuning and text generation tasks.

      Dataset Description
    

    The dataset consists of carefully curated instruction-input-output pairs, formatted for conversational AI training. Each entry contains:

    An instruction that specifies the task An optional input providing context A detailed output that addresses the instruction

      Usage
    

    Thisโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/HappyAIUser/MMLU-Alpaca.

  16. h

    magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format

    • huggingface.co
    Updated Jan 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language & AGI Lab (2025). magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 31, 2025
    Dataset authored and provided by
    Language & AGI Lab
    Description

    LangAGI-Lab/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    ts_repl_ai_alpaca

    • huggingface.co
    Updated Aug 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhijit (2024). ts_repl_ai_alpaca [Dataset]. https://huggingface.co/datasets/abhijitkumarjha88192/ts_repl_ai_alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2024
    Authors
    Abhijit
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    abhijitkumarjha88192/ts_repl_ai_alpaca dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken

    • huggingface.co
    Updated Feb 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Creitin Gameplays (2025). magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken [Dataset]. https://huggingface.co/datasets/CreitinGameplays/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 9, 2025
    Authors
    Creitin Gameplays
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CreitinGameplays/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    py_tiny_codes_alpaca

    • huggingface.co
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhijit (2024). py_tiny_codes_alpaca [Dataset]. https://huggingface.co/datasets/abhijitkumarjha88192/py_tiny_codes_alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2024
    Authors
    Abhijit
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    abhijitkumarjha88192/py_tiny_codes_alpaca dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format...

    • huggingface.co
    Updated Feb 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language & AGI Lab (2025). magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2025
    Dataset authored and provided by
    Language & AGI Lab
    Description

    LangAGI-Lab/magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca

alpaca

Alpaca

tatsu-lab/alpaca

Explore at:
60 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Dataset authored and provided by
Tatsu Lab
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Dataset Card for Alpaca

  Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

The text-davinci-003 engine to generate the instruction data insteadโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

Search
Clear search
Close search
Google apps
Main menu