100+ datasets found

h
alpaca
huggingface.co
opendatalab.com
Updated Mar 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Dataset authored and provided by
Tatsu Lab
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Card for Alpaca

Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.
h
synth-ehr-icd10-alpaca-format
huggingface.co
Updated Jun 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Generative Technologies, Inc (2024). synth-ehr-icd10-alpaca-format [Dataset]. https://huggingface.co/datasets/generative-technologies/synth-ehr-icd10-alpaca-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 26, 2024
Dataset authored and provided by
Generative Technologies, Inc
Description
generative-technologies/synth-ehr-icd10-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sft-ready-Text-Generation-Augmented-Data-Alpaca-Format
huggingface.co
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Janati (2024). sft-ready-Text-Generation-Augmented-Data-Alpaca-Format [Dataset]. https://huggingface.co/datasets/Na0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 11, 2024
Authors
Ali Janati
Description
Na0s/sft-ready-Text-Generation-Augmented-Data-Alpaca-Format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
alpaca-cleaned
huggingface.co
Updated Apr 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gene Ruebsamen (2023). alpaca-cleaned [Dataset]. https://huggingface.co/datasets/yahma/alpaca-cleaned
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2023
Authors
Gene Ruebsamen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Alpaca-Cleaned

Repository: https://github.com/gururise/AlpacaDataCleaned

Dataset Description

This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.

"instruction":"Summarize the… See the full description on the dataset page: https://huggingface.co/datasets/yahma/alpaca-cleaned.
h
vi-alpaca-input-output-format
huggingface.co
Updated Apr 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BKAI-HUST Foundation Models Lab (2025). vi-alpaca-input-output-format [Dataset]. https://huggingface.co/datasets/bkai-foundation-models/vi-alpaca-input-output-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2025
Dataset authored and provided by
BKAI-HUST Foundation Models Lab
Description
🇻🇳 Vietnamese modified Alpaca Dataset

This dataset is especially designed for Vietnamese based on the idea from Stanford Alpaca, Self-Instruct paper and Chinese LLaMA. The motivation behind the creation of this dataset stems from the hope to contribute high-quality dataset to Vietnamese commnunity to train language models. To construct this dataset, we follow a two-step process:

Step 1: Manually create Vietnamese seed tasks We employ the methodology outlined in the Self-Instruct… See the full description on the dataset page: https://huggingface.co/datasets/bkai-foundation-models/vi-alpaca-input-output-format.
h
limo-trial7-alpaca-format
huggingface.co
Updated Feb 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language & AGI Lab (2025). limo-trial7-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/limo-trial7-alpaca-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 14, 2025
Dataset authored and provided by
Language & AGI Lab
Description
LangAGI-Lab/limo-trial7-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ffmperative-alpaca-format-50k
huggingface.co
Updated Mar 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Remyx AI (2024). ffmperative-alpaca-format-50k [Dataset]. https://huggingface.co/datasets/remyxai/ffmperative-alpaca-format-50k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2024
Dataset authored and provided by
Remyx AI
Description
remyxai/ffmperative-alpaca-format-50k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
mental-alpaca-format
huggingface.co
Updated Apr 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
boris (2019). mental-alpaca-format [Dataset]. https://huggingface.co/datasets/usham/mental-alpaca-format
Explore at:
Dataset updated
Apr 18, 2019
Authors
boris
Description
usham/mental-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
alpaca-qa-data
huggingface.co
Updated May 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amitk17 (2025). alpaca-qa-data [Dataset]. https://huggingface.co/datasets/sweatSmile/alpaca-qa-data
Explore at:
Dataset updated
May 21, 2025
Authors
amitk17
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Alpaca-style Question and Answer Dataset

This dataset contains question-answer pairs formatted in the Alpaca instruction style, suitable for instruction fine-tuning of language models.

Format

Each example contains:

instruction: The question input: Empty string (can be used for context in other applications) output: The answer text: The formatted text using the Alpaca template

Template

Below is an instruction that describes a task, paired with an input that… See the full description on the dataset page: https://huggingface.co/datasets/sweatSmile/alpaca-qa-data.
h
limo-small-alpaca-format
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language & AGI Lab, limo-small-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/limo-small-alpaca-format
Explore at:
Dataset authored and provided by
Language & AGI Lab
Description
LangAGI-Lab/limo-small-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j
huggingface.co
Updated Mar 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NLP Cloud (2023). instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j [Dataset]. https://huggingface.co/datasets/nlpcloud/instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2023
Dataset authored and provided by
NLP Cloud
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
This dataset is an adaptation of the Stanford Alpaca dataset in order to turn a text generation model like GPT-J into an "instruct" model. The initial dataset was slightly reworked in order to match the GPT-J fine-tuning format with Mesh Transformer Jax on TPUs.
h
retail-alpaca-format
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S Aditya, retail-alpaca-format [Dataset]. https://huggingface.co/datasets/aditya3w3733/retail-alpaca-format
Explore at:
Authors
S Aditya
Description
aditya3w3733/retail-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
qwen-7b-instruct-8k-rft-alpaca-format
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language & AGI Lab, qwen-7b-instruct-8k-rft-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/qwen-7b-instruct-8k-rft-alpaca-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Language & AGI Lab
Description
LangAGI-Lab/qwen-7b-instruct-8k-rft-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases...
huggingface.co
Updated Feb 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam (2024). 100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases [Dataset]. https://huggingface.co/datasets/adambuttrick/100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 9, 2024
Authors
Adam
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
adambuttrick/100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MMLU-Alpaca
huggingface.co
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RS (2024). MMLU-Alpaca [Dataset]. https://huggingface.co/datasets/HappyAIUser/MMLU-Alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2024
Authors
RS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for MMLU-Alpaca

This dataset contains instruction-input-output pairs converted to ShareGPT format, designed for instruction tuning and text generation tasks.

Dataset Description

The dataset consists of carefully curated instruction-input-output pairs, formatted for conversational AI training. Each entry contains:

An instruction that specifies the task An optional input providing context A detailed output that addresses the instruction

Usage

This… See the full description on the dataset page: https://huggingface.co/datasets/HappyAIUser/MMLU-Alpaca.
h
magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format
huggingface.co
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language & AGI Lab (2025). magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 31, 2025
Dataset authored and provided by
Language & AGI Lab
Description
LangAGI-Lab/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ts_repl_ai_alpaca
huggingface.co
Updated Aug 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhijit (2024). ts_repl_ai_alpaca [Dataset]. https://huggingface.co/datasets/abhijitkumarjha88192/ts_repl_ai_alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2024
Authors
Abhijit
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
abhijitkumarjha88192/ts_repl_ai_alpaca dataset hosted on Hugging Face and contributed by the HF Datasets community
h
magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken
huggingface.co
Updated Feb 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Creitin Gameplays (2025). magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken [Dataset]. https://huggingface.co/datasets/CreitinGameplays/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 9, 2025
Authors
Creitin Gameplays
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CreitinGameplays/magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken dataset hosted on Hugging Face and contributed by the HF Datasets community
h
py_tiny_codes_alpaca
huggingface.co
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhijit (2024). py_tiny_codes_alpaca [Dataset]. https://huggingface.co/datasets/abhijitkumarjha88192/py_tiny_codes_alpaca
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2024
Authors
Abhijit
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
abhijitkumarjha88192/py_tiny_codes_alpaca dataset hosted on Hugging Face and contributed by the HF Datasets community
h
magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format...
huggingface.co
Updated Feb 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language & AGI Lab (2025). magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format [Dataset]. https://huggingface.co/datasets/LangAGI-Lab/magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2025
Dataset authored and provided by
Language & AGI Lab
Description
LangAGI-Lab/magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Tatsu Lab (2023). alpaca [Dataset]. https://huggingface.co/datasets/tatsu-lab/alpaca

alpaca

Alpaca

tatsu-lab/alpaca

Explore at:

60 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 14, 2023

Dataset authored and provided by

Tatsu Lab

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Dataset Card for Alpaca

  Dataset Summary

Alpaca is a dataset of 52,000 instructions and demonstrations generated by OpenAI's text-davinci-003 engine. This instruction data can be used to conduct instruction-tuning for language models and make the language model follow instruction better. The authors built on the data generation pipeline from Self-Instruct framework and made the following modifications:

The text-davinci-003 engine to generate the instruction data instead… See the full description on the dataset page: https://huggingface.co/datasets/tatsu-lab/alpaca.

Clear search

Close search

Google apps

Main menu

alpaca

synth-ehr-icd10-alpaca-format

sft-ready-Text-Generation-Augmented-Data-Alpaca-Format

alpaca-cleaned

vi-alpaca-input-output-format

limo-trial7-alpaca-format

ffmperative-alpaca-format-50k

mental-alpaca-format

alpaca-qa-data

limo-small-alpaca-format

instructions-dataset-adapted-from-stanford-alpaca-for-gpt-j

retail-alpaca-format

qwen-7b-instruct-8k-rft-alpaca-format

100K-ner-indexes-multiple-organizations-locations-alpaca-format-json-response-all-cases...

MMLU-Alpaca

magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format

ts_repl_ai_alpaca

magpie-reasoning-v1-10k-step-by-step-rationale-alpaca-format-changedtoken

py_tiny_codes_alpaca

magpie-reasoning-v1-20k-math-verifiable-verification-min-4000-3200-alpaca-format...

alpaca

Alpaca

tatsu-lab/alpaca