Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
sahil2801/CodeAlpaca-20k dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
This dataset splits the original CodeAlpaca dataset into train and test splits.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Evolved codealpaca
Updates:
2023/08/26 - Filtered results now only contain pure english instruction and removed any mentioned of trained by OAI response
Median sequence length : 471 We employed a methodology similar to that of WizardCoder, with the exception that ours is open-source. We used the gpt-4-0314 and gpt-4-0613 models to augment and answer each response, with the bulk of generation handled by gpt-4-0314. The aim of this dataset is twofold: firstly, to facilitate the… See the full description on the dataset page: https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for CodeAlpaca 20K
This dataset originates from the Code Alpaca repository. The CodeAlpaca 20K dataset is specifically used for training code generation models.
Dataset Details
Dataset Description
Each sample is comprised of three columns: instruction, input and output.
Language(s): English License: Apache-2.0 License
Dataset Sources
The code from the original repository was adopted to post it here.
Repository:… See the full description on the dataset page: https://huggingface.co/datasets/flwrlabs/code-alpaca-20k.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Code Alpaca 20K – Code + Explanation
🧠 A dataset designed to enhance large language models (LLMs) with code generation and instructional explanation capabilities.This version is an extension of the original sahil2801/CodeAlpaca-20k, with AI-generated explanations added to the output section using the Gemini API.
📘 Overview
This dataset enhances the original CodeAlpaca-20k examples by adding natural language explanations to code outputs. The goal is not just to… See the full description on the dataset page: https://huggingface.co/datasets/ByGedik/CodeAlpaca-20k-CodePlusExplanation.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
thisisanshgupta/CodeAlpaca dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
codealpaca for text2text generation
This dataset was downloaded from the sahil280114/codealpaca github repo and parsed into text2text format for "generating" instructions. It was downloaded under the wonderful Creative Commons Attribution-NonCommercial 4.0 International Public License (see snapshots of the repo and data license), so that license applies to this dataset. Note that the inputs and instruction columns in the original dataset have been aggregated together for text2text… See the full description on the dataset page: https://huggingface.co/datasets/pszemraj/fleece2instructions-codealpaca.
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
codealpaca-personified-300k
Dataset Details
Dataset Description
codealpaca-personified-300k is a synthetic code generation instruction dataset built by applying Code Alpaca prompting with synthetic programming personas from argilla/FinePersonas-v0.1.
Dataset Sources
Repository: https://github.com/jon-tow/codeaplaca-personified
Citation
@misc{distilabel-argilla-2024, author = {Álvaro Bartolomé Del Canto and Gabriel Martín… See the full description on the dataset page: https://huggingface.co/datasets/jon-tow/codealpaca-personified-300k.
RoxanneWsyw/CodeAlpaca dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for "CodeAlpaca-20k"
More Information needed
Prateek-Gupta123/CodeAlpaca-1k-revised dataset hosted on Hugging Face and contributed by the HF Datasets community
AlekseyKorshuk/evol-codealpaca-v1-dpo dataset hosted on Hugging Face and contributed by the HF Datasets community
autoprogrammer/CodeAlpaca-lf-processed dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
KOIIIII/evol-codealpaca-python-subset dataset hosted on Hugging Face and contributed by the HF Datasets community
rohanawhad/CodeAlpaca-20k-finetuning-format dataset hosted on Hugging Face and contributed by the HF Datasets community
AlekseyKorshuk/evol-codealpaca-pairwise-sharegpt dataset hosted on Hugging Face and contributed by the HF Datasets community
NewstaR/codealpaca-graded dataset hosted on Hugging Face and contributed by the HF Datasets community
Evol-codealpaca-v1_scored - with OpenDataArena Scores
This dataset is a scored version of the original theblackcat102/evol-codealpaca-v1 dataset. The scoring was performed using the OpenDataArena-Tool, a comprehensive suite of automated evaluation methods for assessing instruction-following datasets. This version of the dataset includes rich, multi-dimensional scores for both the instructions (questions) and the instruction-response pairs, allowing for highly granular data analysis… See the full description on the dataset page: https://huggingface.co/datasets/OpenDataArena/evol-codealpaca-v1_scored.
nguyenvanviet/CodeAlpaca-DeepSeek-32B-Reasoning dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ding0702/test dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
sahil2801/CodeAlpaca-20k dataset hosted on Hugging Face and contributed by the HF Datasets community