93 datasets found
  1. h

    Evol-Instruct-Code-80k-v1

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated Jul 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Roshdieh (2023). Evol-Instruct-Code-80k-v1 [Dataset]. https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 22, 2023
    Authors
    Nick Roshdieh
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Open Source Implementation of Evol-Instruct-Code as described in the WizardCoder Paper. Code for the intruction generation can be found on Github as Evol-Teacher.

  2. h

    Magicoder-Evol-Instruct-110K

    • huggingface.co
    Updated Mar 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intellligent Software Engineering (iSE) (2020). Magicoder-Evol-Instruct-110K [Dataset]. https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2020
    Dataset authored and provided by
    Intellligent Software Engineering (iSE)
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    A decontaminated version of evol-codealpaca-v1. Decontamination is done in the same way as StarCoder (bigcode decontamination process).

  3. h

    Evol-Instruct-Python-26k

    • huggingface.co
    Updated Aug 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxime Labonne (2023). Evol-Instruct-Python-26k [Dataset]. https://huggingface.co/datasets/mlabonne/Evol-Instruct-Python-26k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 25, 2023
    Authors
    Maxime Labonne
    Description

    Evol-Instruct-Python-26k

    Filtered version of the nickrosh/Evol-Instruct-Code-80k-v1 dataset that only keeps Python code (26,588 samples). You can find a smaller version of it here mlabonne/Evol-Instruct-Python-1k. Here is the distribution of the number of tokens in each row (instruction + output) using Llama's tokenizer:

  4. h

    evol-instruct-deutsch

    • huggingface.co
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2023). evol-instruct-deutsch [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-deutsch
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 12, 2023
    Dataset authored and provided by
    FreedomAI
    Description

    The dataset is used in the research related to MultilingualSIFT.

  5. t

    Evol-Instruct-Code-80k - Dataset - LDM

    • service.tib.eu
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Evol-Instruct-Code-80k - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/evol-instruct-code-80k
    Explore at:
    Dataset updated
    Dec 2, 2024
    Description

    Evol-Instruct-Code-80k is a dataset for evaluating the performance of code generation models.

  6. t

    Xu, Sun, Zheng, Geng, Zhao, Tao, Jiang (2024). Dataset: Evol-Instruct-70k....

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Xu, Sun, Zheng, Geng, Zhao, Tao, Jiang (2024). Dataset: Evol-Instruct-70k. https://doi.org/10.57702/c0nqt31p [Dataset]. https://service.tib.eu/ldmservice/dataset/evol-instruct-70k
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in the paper for in-context learning task

  7. h

    Evol-Instruct-Chinese-GPT4

    • huggingface.co
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FreedomAI (2023). Evol-Instruct-Chinese-GPT4 [Dataset]. https://huggingface.co/datasets/FreedomIntelligence/Evol-Instruct-Chinese-GPT4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 28, 2023
    Dataset authored and provided by
    FreedomAI
    Description

    The dataset is created by (1) translating English questions of Evol-instruct-70k into Chinese and (2) requesting GPT4 to generate Chinese responses. For more details, please refer to:

    Repository: https://github.com/FreedomIntelligence/AceGPT https://github.com/FreedomIntelligence/LLMZoo

    Paper: AceGPT, Localizing Large Language Models in Arabic Phoenix: Democratizing ChatGPT across Languages

      BibTeX entry and citation info
    

    @article{huang2023acegpt, title={AceGPT, Localizing… See the full description on the dataset page: https://huggingface.co/datasets/FreedomIntelligence/Evol-Instruct-Chinese-GPT4.

  8. viet-evol-instruct

    • kaggle.com
    zip
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoverLover (2024). viet-evol-instruct [Dataset]. https://www.kaggle.com/coverlover/viet-evol-instruct
    Explore at:
    zip(66647084 bytes)Available download formats
    Dataset updated
    Jun 8, 2024
    Authors
    CoverLover
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by CoverLover

    Released under MIT

    Contents

  9. h

    Evol-Instruct-Code

    • huggingface.co
    Updated Feb 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mabry (2024). Evol-Instruct-Code [Dataset]. https://huggingface.co/datasets/artificial-citizen/Evol-Instruct-Code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2024
    Authors
    Mabry
    Description

    artificial-citizen/Evol-Instruct-Code dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    clusteredleaves-evol-instruct

    • huggingface.co
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alignment Lab AI (2024). clusteredleaves-evol-instruct [Dataset]. https://huggingface.co/datasets/Alignment-Lab-AI/clusteredleaves-evol-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 20, 2024
    Authors
    Alignment Lab AI
    Description

    Alignment-Lab-AI/clusteredleaves-evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. O

    Wizard-LM-Chinese-instruct-evol

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jan 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SenseTime (2023). Wizard-LM-Chinese-instruct-evol [Dataset]. https://opendatalab.com/OpenDataLab/Wizard-LM-Chinese-instruct-evol
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2023
    Dataset provided by
    SenseTime
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Wizard-LM-Chinese is a data set that translates instructions on MSRA's Wizard-LM data set and then calls GPT to obtain the answer. Wizard-LM contains many instructions that are more difficult than Alpaca. In Chinese translation problems, a small amount of instruction injection may cause translation failure. Chinese answers are obtained by asking questions based on Chinese questions.

  12. h

    distilabel-sample-evol-instruct

    • huggingface.co
    Updated Jan 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agustín Piqueres Lajarín (2024). distilabel-sample-evol-instruct [Dataset]. https://huggingface.co/datasets/plaguss/distilabel-sample-evol-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 14, 2024
    Authors
    Agustín Piqueres Lajarín
    Description

    plaguss/distilabel-sample-evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. r

    Wizardcoder

    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z. Luo; C. Xu; P. Zhao; Q. Sun; X. Geng; W. Hu; C. Tao; J. Ma; Q. Lin; D. Jiang (2024). Wizardcoder [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvd2l6YXJkY29kZXI=
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Z. Luo; C. Xu; P. Zhao; Q. Sun; X. Geng; W. Hu; C. Tao; J. Ma; Q. Lin; D. Jiang
    Description

    Wizardcoder: Empowering code large language models with evol-instruct

  14. AI Research Instructions and Outputs

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). AI Research Instructions and Outputs [Dataset]. https://www.kaggle.com/datasets/thedevastator/ai-research-instructions-and-outputs/discussion
    Explore at:
    zip(32193107 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    AI Research Instructions and Outputs

    Driving Innovation in Machine Learning and AI Exploration

    By Huggingface Hub [source]

    About this dataset

    This dataset contains 80,000 unique pairs of instructions and outputs to be used for Machine Learning and AI research. Instructions such as 'run', 'walk', 'jump', and 'dance' have outputs that represent the results of executing each instruction. It provides a groundbreaking collection of knowledge that can be leveraged in ways such as training AI agents, building intelligent natural language applications, exploring autonomous navigation possibilities, developing dialogues between bots and humans, replicating robotic tasks and research into sophisticated AI models able to understand instructions in various domains like engineering, medicine, finance or law. This dataset has the potential to revolutionize how we approach Artificial Intelligence by pushing boundaries when it comes to data-driven machine learning strategies. With its powerful combination of detailed information from multiple angles – language comprehension from verbal commands alongside increased contextual understanding – we can pave the way for more comprehensive applications of AI technology with exponentially enhanced accuracy when compared to existing methods

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains 80,000 pairs of instructions and outputs for Machine Learning and AI research. This data can be used to teach a variety of AI agents, as well as for tasks like autonomous navigation, dialogue, language modelling, natural language processing (NLP), robotics applications and more. The following guide outlines the steps you'll need to take in order to get the most out of this incredible resource.

    • Download the dataset from Kaggle - Once downloaded you'll have access to two files: instruction.csv & output.csv.
    • Examine the data - Take some time familiarizing yourself with the dataset- The columns will contain instructions/verbs such as 'run', walk', 'jump' etc., along with accompanying output results that have been generated from executing those instructions.
    • Transform the data - Utilize feature engineering techniques appropriate for your project/proposed application in order to transform or extract relevant features from this dataset that can be utilized downstream by either supervised algorithms such as neural networks or unsupervised methods such as clustering algorithms.
      4 Train & Test models – Develop predictive models using either supervised or unsupervised techniques according; adjust hyperparameters until desired results are obtained; split into a training set (80%) and validation set (20%) first before running on full dataset so that model performance can be properly assessed against validation/test datasets; additional notes here about repeatability vs randomization etc… 5 Deploy Models – Deploy model onto real world scenarios/environments where appropriate .e.. an autonomous car relying on natural language inputs when driving through town; a domestic robot understanding sentences given by its user etc…

    Research Ideas

    • Training virtual assistants with specific domain knowledge (e.g. medical, finance, etc).
    • Develop autonomous navigation systems that respond to verbal instructions given by a user in natural language format.
    • Creating dialogue agents that can answer questions based on a pre-defined set of rules pertaining to the instructions given by the user

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without...

  15. h

    evol-codealpaca-v1

    • huggingface.co
    • kaggle.com
    Updated Sep 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    theblackcat102 (2023). evol-codealpaca-v1 [Dataset]. https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2023
    Authors
    theblackcat102
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Evolved codealpaca

    Updates:

    2023/08/26 - Filtered results now only contain pure english instruction and removed any mentioned of trained by OAI response

    Median sequence length : 471 We employed a methodology similar to that of WizardCoder, with the exception that ours is open-source. We used the gpt-4-0314 and gpt-4-0613 models to augment and answer each response, with the bulk of generation handled by gpt-4-0314. The aim of this dataset is twofold: firstly, to facilitate the… See the full description on the dataset page: https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1.

  16. h

    Collective-Evol-Instruct-v0.1

    • huggingface.co
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Gabarain (2025). Collective-Evol-Instruct-v0.1 [Dataset]. https://huggingface.co/datasets/Locutusque/Collective-Evol-Instruct-v0.1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2025
    Authors
    Sebastian Gabarain
    Description

    Locutusque/Collective-Evol-Instruct-v0.1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    evol-instruct

    • huggingface.co
    Updated Aug 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zichao Hu (2025). evol-instruct [Dataset]. https://huggingface.co/datasets/zichao22/evol-instruct
    Explore at:
    Dataset updated
    Aug 8, 2025
    Authors
    Zichao Hu
    Description

    zichao22/evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    evol-instruct

    • huggingface.co
    Updated Apr 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alvaro Bartolome (2024). evol-instruct [Dataset]. https://huggingface.co/datasets/alvarobartt/evol-instruct
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Authors
    Alvaro Bartolome
    Description

    alvarobartt/evol-instruct dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    Codefuse-Evol-Instruct-Clean

    • huggingface.co
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yejie Wang (2024). Codefuse-Evol-Instruct-Clean [Dataset]. https://huggingface.co/datasets/banksy235/Codefuse-Evol-Instruct-Clean
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Authors
    Yejie Wang
    Description

    banksy235/Codefuse-Evol-Instruct-Clean dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. H

    Evolution-NOS Instruction at the University of Virginia

    • dataverse.harvard.edu
    Updated Jun 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Sloane (2023). Evolution-NOS Instruction at the University of Virginia [Dataset]. http://doi.org/10.7910/DVN/QPT5RH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Jeremy Sloane
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Virginia
    Description

    Data from a study investigating the influence of Nature of Science Instruction on evolution acceptance.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nick Roshdieh (2023). Evol-Instruct-Code-80k-v1 [Dataset]. https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1

Evol-Instruct-Code-80k-v1

nickrosh/Evol-Instruct-Code-80k-v1

Explore at:
21 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2023
Authors
Nick Roshdieh
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Open Source Implementation of Evol-Instruct-Code as described in the WizardCoder Paper. Code for the intruction generation can be found on Github as Evol-Teacher.

Search
Clear search
Close search
Google apps
Main menu