45 datasets found
  1. h

    CodeAlpaca-20k

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Chaudhary, CodeAlpaca-20k [Dataset]. https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Sahil Chaudhary
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    sahil2801/CodeAlpaca-20k dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. CodeAlpaca_20K

    • huggingface.co
    • opendatalab.com
    Updated Mar 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face H4 (2023). CodeAlpaca_20K [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face H4
    License

    https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/

    Description

    This dataset splits the original CodeAlpaca dataset into train and test splits.

  3. h

    evol-codealpaca-v1

    • huggingface.co
    Updated Sep 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    theblackcat102 (2023). evol-codealpaca-v1 [Dataset]. https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2023
    Authors
    theblackcat102
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Evolved codealpaca

    Updates:

    2023/08/26 - Filtered results now only contain pure english instruction and removed any mentioned of trained by OAI response

    Median sequence length : 471 We employed a methodology similar to that of WizardCoder, with the exception that ours is open-source. We used the gpt-4-0314 and gpt-4-0613 models to augment and answer each response, with the bulk of generation handled by gpt-4-0314. The aim of this dataset is twofold: firstly, to facilitate the… See the full description on the dataset page: https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1.

  4. code-alpaca-20k

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flower Labs (2023). code-alpaca-20k [Dataset]. https://huggingface.co/datasets/flwrlabs/code-alpaca-20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Flower Labs GmbH
    Authors
    Flower Labs
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for CodeAlpaca 20K

    This dataset originates from the Code Alpaca repository. The CodeAlpaca 20K dataset is specifically used for training code generation models.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Each sample is comprised of three columns: instruction, input and output.

    Language(s): English License: Apache-2.0 License

      Dataset Sources
    

    The code from the original repository was adopted to post it here.

    Repository:… See the full description on the dataset page: https://huggingface.co/datasets/flwrlabs/code-alpaca-20k.

  5. h

    CodeAlpaca-20k-CodePlusExplanation

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gedik (2025). CodeAlpaca-20k-CodePlusExplanation [Dataset]. https://huggingface.co/datasets/ByGedik/CodeAlpaca-20k-CodePlusExplanation
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Gedik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Code Alpaca 20K – Code + Explanation

    🧠 A dataset designed to enhance large language models (LLMs) with code generation and instructional explanation capabilities.This version is an extension of the original sahil2801/CodeAlpaca-20k, with AI-generated explanations added to the output section using the Gemini API.

      📘 Overview
    

    This dataset enhances the original CodeAlpaca-20k examples by adding natural language explanations to code outputs. The goal is not just to… See the full description on the dataset page: https://huggingface.co/datasets/ByGedik/CodeAlpaca-20k-CodePlusExplanation.

  6. h

    CodeAlpaca

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ansh Gupta, CodeAlpaca [Dataset]. https://huggingface.co/datasets/thisisanshgupta/CodeAlpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Ansh Gupta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    thisisanshgupta/CodeAlpaca dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    fleece2instructions-codealpaca

    • huggingface.co
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Szemraj (2023). fleece2instructions-codealpaca [Dataset]. https://huggingface.co/datasets/pszemraj/fleece2instructions-codealpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2023
    Authors
    Peter Szemraj
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    codealpaca for text2text generation

    This dataset was downloaded from the sahil280114/codealpaca github repo and parsed into text2text format for "generating" instructions. It was downloaded under the wonderful Creative Commons Attribution-NonCommercial 4.0 International Public License (see snapshots of the repo and data license), so that license applies to this dataset. Note that the inputs and instruction columns in the original dataset have been aggregated together for text2text… See the full description on the dataset page: https://huggingface.co/datasets/pszemraj/fleece2instructions-codealpaca.

  8. h

    codealpaca-personified-300k

    • huggingface.co
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Tow (2024). codealpaca-personified-300k [Dataset]. http://doi.org/10.57967/hf/3631
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2024
    Authors
    Jonathan Tow
    License

    https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/

    Description

    codealpaca-personified-300k

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    codealpaca-personified-300k is a synthetic code generation instruction dataset built by applying Code Alpaca prompting with synthetic programming personas from argilla/FinePersonas-v0.1.

      Dataset Sources
    

    Repository: https://github.com/jon-tow/codeaplaca-personified

      Citation
    

    @misc{distilabel-argilla-2024, author = {Álvaro Bartolomé Del Canto and Gabriel Martín… See the full description on the dataset page: https://huggingface.co/datasets/jon-tow/codealpaca-personified-300k.

  9. h

    CodeAlpaca

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roxanne Zhang (2023). CodeAlpaca [Dataset]. https://huggingface.co/datasets/RoxanneWsyw/CodeAlpaca
    Explore at:
    Dataset updated
    Mar 31, 2023
    Authors
    Roxanne Zhang
    Description

    RoxanneWsyw/CodeAlpaca dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    CodeAlpaca-20k

    • huggingface.co
    Updated Oct 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pxy (2024). CodeAlpaca-20k [Dataset]. https://huggingface.co/datasets/pxyyy/CodeAlpaca-20k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2024
    Authors
    pxy
    Description

    Dataset Card for "CodeAlpaca-20k"

    More Information needed

  11. h

    CodeAlpaca-1k-revised

    • huggingface.co
    Updated Nov 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prateek Gupta (2024). CodeAlpaca-1k-revised [Dataset]. https://huggingface.co/datasets/Prateek-Gupta123/CodeAlpaca-1k-revised
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2024
    Authors
    Prateek Gupta
    Description

    Prateek-Gupta123/CodeAlpaca-1k-revised dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    evol-codealpaca-v1-dpo

    • huggingface.co
    Updated Jun 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksey Korshuk (2024). evol-codealpaca-v1-dpo [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/evol-codealpaca-v1-dpo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2024
    Authors
    Aleksey Korshuk
    Description

    AlekseyKorshuk/evol-codealpaca-v1-dpo dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    CodeAlpaca-lf-processed

    • huggingface.co
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junxia Cui (2025). CodeAlpaca-lf-processed [Dataset]. https://huggingface.co/datasets/autoprogrammer/CodeAlpaca-lf-processed
    Explore at:
    Dataset updated
    Jun 12, 2025
    Authors
    Junxia Cui
    Description

    autoprogrammer/CodeAlpaca-lf-processed dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    evol-codealpaca-python-subset

    • huggingface.co
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koi (2025). evol-codealpaca-python-subset [Dataset]. https://huggingface.co/datasets/KOIIIII/evol-codealpaca-python-subset
    Explore at:
    Dataset updated
    Apr 26, 2025
    Authors
    Koi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    KOIIIII/evol-codealpaca-python-subset dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    CodeAlpaca-20k-finetuning-format

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohan Awhad, CodeAlpaca-20k-finetuning-format [Dataset]. https://huggingface.co/datasets/rohanawhad/CodeAlpaca-20k-finetuning-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Rohan Awhad
    Description

    rohanawhad/CodeAlpaca-20k-finetuning-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    evol-codealpaca-pairwise-sharegpt

    • huggingface.co
    Updated Jan 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksey Korshuk (2024). evol-codealpaca-pairwise-sharegpt [Dataset]. https://huggingface.co/datasets/AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2024
    Authors
    Aleksey Korshuk
    Description

    AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    codealpaca-graded

    • huggingface.co
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Newstar Research ASIA (2023). codealpaca-graded [Dataset]. https://huggingface.co/datasets/NewstaR/codealpaca-graded
    Explore at:
    Dataset updated
    Apr 24, 2023
    Dataset authored and provided by
    Newstar Research ASIA
    Description

    NewstaR/codealpaca-graded dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    evol-codealpaca-v1_scored

    • huggingface.co
    Updated Mar 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataArena (2020). evol-codealpaca-v1_scored [Dataset]. https://huggingface.co/datasets/OpenDataArena/evol-codealpaca-v1_scored
    Explore at:
    Dataset updated
    Mar 19, 2020
    Authors
    OpenDataArena
    Description

    Evol-codealpaca-v1_scored - with OpenDataArena Scores

    This dataset is a scored version of the original theblackcat102/evol-codealpaca-v1 dataset. The scoring was performed using the OpenDataArena-Tool, a comprehensive suite of automated evaluation methods for assessing instruction-following datasets. This version of the dataset includes rich, multi-dimensional scores for both the instructions (questions) and the instruction-response pairs, allowing for highly granular data analysis… See the full description on the dataset page: https://huggingface.co/datasets/OpenDataArena/evol-codealpaca-v1_scored.

  19. h

    CodeAlpaca-DeepSeek-32B-Reasoning

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyen Van Viet, CodeAlpaca-DeepSeek-32B-Reasoning [Dataset]. https://huggingface.co/datasets/nguyenvanviet/CodeAlpaca-DeepSeek-32B-Reasoning
    Explore at:
    Authors
    Nguyen Van Viet
    Description

    nguyenvanviet/CodeAlpaca-DeepSeek-32B-Reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    test

    • huggingface.co
    Updated Dec 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ding (2024). test [Dataset]. https://huggingface.co/datasets/Ding0702/test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2024
    Authors
    Ding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ding0702/test dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sahil Chaudhary, CodeAlpaca-20k [Dataset]. https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k

CodeAlpaca-20k

CodeAlpaca 20K

sahil2801/CodeAlpaca-20k

Explore at:
44 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Sahil Chaudhary
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

sahil2801/CodeAlpaca-20k dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu