93 datasets found

h
Evol-Instruct-Code-80k-v1
huggingface.co
opendatalab.com
+1more
Updated Jul 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Roshdieh (2023). Evol-Instruct-Code-80k-v1 [Dataset]. https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2023
Authors
Nick Roshdieh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Open Source Implementation of Evol-Instruct-Code as described in the WizardCoder Paper. Code for the intruction generation can be found on Github as Evol-Teacher.
h
Magicoder-Evol-Instruct-110K
huggingface.co
Updated Mar 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intellligent Software Engineering (iSE) (2020). Magicoder-Evol-Instruct-110K [Dataset]. https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 19, 2020
Dataset authored and provided by
Intellligent Software Engineering (iSE)
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
A decontaminated version of evol-codealpaca-v1. Decontamination is done in the same way as StarCoder (bigcode decontamination process).
h
Evol-Instruct-Python-26k
huggingface.co
Updated Aug 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maxime Labonne (2023). Evol-Instruct-Python-26k [Dataset]. https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-26k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 25, 2023
Authors
Maxime Labonne
Description
Evol-Instruct-Python-26k

Filtered version of the nickrosh/Evol-Instruct-Code-80k-v1 dataset that only keeps Python code (26,588 samples). You can find a smaller version of it here mlabonne/Evol-Instruct-Python-1k. Here is the distribution of the number of tokens in each row (instruction + output) using Llama's tokenizer:
h
evol-instruct-deutsch
huggingface.co
Updated Jul 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FreedomAI (2023). evol-instruct-deutsch [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-deutsch
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 12, 2023
Dataset authored and provided by
FreedomAI
Description
The dataset is used in the research related to MultilingualSIFT.
t
Evol-Instruct-Code-80k - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Evol-Instruct-Code-80k - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/evol-instruct-code-80k
Explore at:
Dataset updated
Dec 2, 2024
Description
Evol-Instruct-Code-80k is a dataset for evaluating the performance of code generation models.
t
Xu, Sun, Zheng, Geng, Zhao, Tao, Jiang (2024). Dataset: Evol-Instruct-70k....
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Xu, Sun, Zheng, Geng, Zhao, Tao, Jiang (2024). Dataset: Evol-Instruct-70k. https://doi.org/10.57702/c0nqt31p [Dataset]. https://service.tib.eu/ldmservice/dataset/evol-instruct-70k
Explore at:
Dataset updated
Dec 16, 2024
Description
The dataset used in the paper for in-context learning task
h
Evol-Instruct-Chinese-GPT4
huggingface.co
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FreedomAI (2023). Evol-Instruct-Chinese-GPT4 [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/Evol-Instruct-Chinese-GPT4
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 28, 2023
Dataset authored and provided by
FreedomAI
Description
The dataset is created by (1) translating English questions of Evol-instruct-70k into Chinese and (2) requesting GPT4 to generate Chinese responses. For more details, please refer to:

Repository: https://github.com/FreedomIntelligence/AceGPT https://github.com/FreedomIntelligence/LLMZoo

Paper: AceGPT, Localizing Large Language Models in Arabic Phoenix: Democratizing ChatGPT across Languages

BibTeX entry and citation info

@article{huang2023acegpt, title={AceGPT, Localizing… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/Evol-Instruct-Chinese-GPT4.
viet-evol-instruct
kaggle.com
zip
Updated Jun 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoverLover (2024). viet-evol-instruct [Dataset]. https://www.kaggle.com/coverlover/viet-evol-instruct
Explore at:
zip(66647084 bytes)Available download formats
Dataset updated
Jun 8, 2024
Authors
CoverLover
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by CoverLover

Released under MIT

Contents
h
Evol-Instruct-Code
huggingface.co
Updated Feb 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mabry (2024). Evol-Instruct-Code [Dataset]. https://huggingface.co/datasets/artificial-citizen/Evol-Instruct-Code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 4, 2024
Authors
Mabry
Description
artificial-citizen/Evol-Instruct-Code dataset hosted on Hugging Face and contributed by the HF Datasets community
h
clusteredleaves-evol-instruct
huggingface.co
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alignment Lab AI (2024). clusteredleaves-evol-instruct [Dataset]. https://huggingface.co/datasets/Alignment-Lab-AI/clusteredleaves-evol-instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 20, 2024
Authors
Alignment Lab AI
Description
Alignment-Lab-AI/clusteredleaves-evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
O
Wizard-LM-Chinese-instruct-evol
opendatalab.com
huggingface.co
zip
Updated Jan 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SenseTime (2023). Wizard-LM-Chinese-instruct-evol [Dataset]. https://opendatalab.com/OpenDataLab/Wizard-LM-Chinese-instruct-evol
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2023
Dataset provided by
SenseTime
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Wizard-LM-Chinese is a data set that translates instructions on MSRA's Wizard-LM data set and then calls GPT to obtain the answer. Wizard-LM contains many instructions that are more difficult than Alpaca. In Chinese translation problems, a small amount of instruction injection may cause translation failure. Chinese answers are obtained by asking questions based on Chinese questions.
h
distilabel-sample-evol-instruct
huggingface.co
Updated Jan 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agustín Piqueres Lajarín (2024). distilabel-sample-evol-instruct [Dataset]. https://huggingface.co/datasets/plaguss/distilabel-sample-evol-instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 14, 2024
Authors
Agustín Piqueres Lajarín
Description
plaguss/distilabel-sample-evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
r
Wizardcoder
resodate.org
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Z. Luo; C. Xu; P. Zhao; Q. Sun; X. Geng; W. Hu; C. Tao; J. Ma; Q. Lin; D. Jiang (2024). Wizardcoder [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvd2l6YXJkY29kZXI=
Explore at:
Dataset updated
Dec 16, 2024
Dataset provided by
Leibniz Data Manager
Authors
Z. Luo; C. Xu; P. Zhao; Q. Sun; X. Geng; W. Hu; C. Tao; J. Ma; Q. Lin; D. Jiang
Description
Wizardcoder: Empowering code large language models with evol-instruct
AI Research Instructions and Outputs
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). AI Research Instructions and Outputs [Dataset]. https://www.kaggle.com/datasets/thedevastator/ai-research-instructions-and-outputs/discussion
Explore at:
zip(32193107 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
AI Research Instructions and Outputs

Driving Innovation in Machine Learning and AI Exploration

By Huggingface Hub [source]

About this dataset

This dataset contains 80,000 unique pairs of instructions and outputs to be used for Machine Learning and AI research. Instructions such as 'run', 'walk', 'jump', and 'dance' have outputs that represent the results of executing each instruction. It provides a groundbreaking collection of knowledge that can be leveraged in ways such as training AI agents, building intelligent natural language applications, exploring autonomous navigation possibilities, developing dialogues between bots and humans, replicating robotic tasks and research into sophisticated AI models able to understand instructions in various domains like engineering, medicine, finance or law. This dataset has the potential to revolutionize how we approach Artificial Intelligence by pushing boundaries when it comes to data-driven machine learning strategies. With its powerful combination of detailed information from multiple angles – language comprehension from verbal commands alongside increased contextual understanding – we can pave the way for more comprehensive applications of AI technology with exponentially enhanced accuracy when compared to existing methods

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains 80,000 pairs of instructions and outputs for Machine Learning and AI research. This data can be used to teach a variety of AI agents, as well as for tasks like autonomous navigation, dialogue, language modelling, natural language processing (NLP), robotics applications and more. The following guide outlines the steps you'll need to take in order to get the most out of this incredible resource.

Download the dataset from Kaggle - Once downloaded you'll have access to two files: instruction.csv & output.csv.

Examine the data - Take some time familiarizing yourself with the dataset- The columns will contain instructions/verbs such as 'run', walk', 'jump' etc., along with accompanying output results that have been generated from executing those instructions.

Transform the data - Utilize feature engineering techniques appropriate for your project/proposed application in order to transform or extract relevant features from this dataset that can be utilized downstream by either supervised algorithms such as neural networks or unsupervised methods such as clustering algorithms.
4 Train & Test models – Develop predictive models using either supervised or unsupervised techniques according; adjust hyperparameters until desired results are obtained; split into a training set (80%) and validation set (20%) first before running on full dataset so that model performance can be properly assessed against validation/test datasets; additional notes here about repeatability vs randomization etc… 5 Deploy Models – Deploy model onto real world scenarios/environments where appropriate .e.. an autonomous car relying on natural language inputs when driving through town; a domestic robot understanding sentences given by its user etc…

Research Ideas

Training virtual assistants with specific domain knowledge (e.g. medical, finance, etc).

Develop autonomous navigation systems that respond to verbal instructions given by a user in natural language format.

Creating dialogue agents that can answer questions based on a pre-defined set of rules pertaining to the instructions given by the user

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without...
h
evol-codealpaca-v1
huggingface.co
kaggle.com
Updated Sep 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
theblackcat102 (2023). evol-codealpaca-v1 [Dataset]. https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2023
Authors
theblackcat102
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Evolved codealpaca

Updates:

2023/08/26 - Filtered results now only contain pure english instruction and removed any mentioned of trained by OAI response

Median sequence length : 471 We employed a methodology similar to that of WizardCoder, with the exception that ours is open-source. We used the gpt-4-0314 and gpt-4-0613 models to augment and answer each response, with the bulk of generation handled by gpt-4-0314. The aim of this dataset is twofold: firstly, to facilitate the… See the full description on the dataset page: https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1.
h
Collective-Evol-Instruct-v0.1
huggingface.co
Updated Jan 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Gabarain (2025). Collective-Evol-Instruct-v0.1 [Dataset]. https://huggingface.co/datasets/Locutusque/Collective-Evol-Instruct-v0.1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2025
Authors
Sebastian Gabarain
Description
Locutusque/Collective-Evol-Instruct-v0.1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
evol-instruct
huggingface.co
Updated Aug 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zichao Hu (2025). evol-instruct [Dataset]. https://huggingface.co/datasets/zichao22/evol-instruct
Explore at:
Dataset updated
Aug 8, 2025
Authors
Zichao Hu
Description
zichao22/evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
h
evol-instruct
huggingface.co
Updated Apr 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alvaro Bartolome (2024). evol-instruct [Dataset]. https://huggingface.co/datasets/alvarobartt/evol-instruct
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2024
Authors
Alvaro Bartolome
Description
alvarobartt/evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Codefuse-Evol-Instruct-Clean
huggingface.co
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yejie Wang (2024). Codefuse-Evol-Instruct-Clean [Dataset]. https://huggingface.co/datasets/banksy235/Codefuse-Evol-Instruct-Clean
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2024
Authors
Yejie Wang
Description
banksy235/Codefuse-Evol-Instruct-Clean dataset hosted on Hugging Face and contributed by the HF Datasets community
H
Evolution-NOS Instruction at the University of Virginia
dataverse.harvard.edu
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sloane (2023). Evolution-NOS Instruction at the University of Virginia [Dataset]. http://doi.org/10.7910/DVN/QPT5RH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/QPT5RH
Dataset updated
Jun 12, 2023
Dataset provided by
Harvard Dataverse
Authors
Jeremy Sloane
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Virginia
Description
Data from a study investigating the influence of Nature of Science Instruction on evolution acceptance.

Facebook

Twitter

Click to copy link

Link copied

Cite

Nick Roshdieh (2023). Evol-Instruct-Code-80k-v1 [Dataset]. https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1

Evol-Instruct-Code-80k-v1

nickrosh/Evol-Instruct-Code-80k-v1

Explore at:

21 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 22, 2023

Authors

Nick Roshdieh

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Open Source Implementation of Evol-Instruct-Code as described in the WizardCoder Paper. Code for the intruction generation can be found on Github as Evol-Teacher.

Clear search

Close search

Google apps

Main menu

Evol-Instruct-Code-80k-v1

Magicoder-Evol-Instruct-110K

Evol-Instruct-Python-26k

evol-instruct-deutsch

Evol-Instruct-Code-80k - Dataset - LDM

Xu, Sun, Zheng, Geng, Zhao, Tao, Jiang (2024). Dataset: Evol-Instruct-70k....

Evol-Instruct-Chinese-GPT4

viet-evol-instruct

Dataset

Contents

Evol-Instruct-Code

clusteredleaves-evol-instruct

Wizard-LM-Chinese-instruct-evol

distilabel-sample-evol-instruct

Wizardcoder

AI Research Instructions and Outputs

AI Research Instructions and Outputs

Driving Innovation in Machine Learning and AI Exploration

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

evol-codealpaca-v1

Collective-Evol-Instruct-v0.1

evol-instruct

evol-instruct

Codefuse-Evol-Instruct-Clean

Evolution-NOS Instruction at the University of Virginia

Evol-Instruct-Code-80k-v1

nickrosh/Evol-Instruct-Code-80k-v1