26 datasets found
  1. databricks-dolly-15k

    • huggingface.co
    Updated Apr 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Databricks (2023). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 17, 2023
    Dataset authored and provided by
    Databrickshttp://databricks.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

  2. h

    databricks-dolly-15k

    • huggingface.co
    Updated Mar 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Bai (2025). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/andrewbai/databricks-dolly-15k
    Explore at:
    Dataset updated
    Mar 18, 2025
    Authors
    Andrew Bai
    Description

    andrewbai/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    databricks-dolly-15k-ja-gozaru

    • huggingface.co
    Updated May 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bbz662bbz (2023). databricks-dolly-15k-ja-gozaru [Dataset]. https://huggingface.co/datasets/bbz662bbz/databricks-dolly-15k-ja-gozaru
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2023
    Authors
    bbz662bbz
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This dataset was using "kunishou/databricks-dolly-15k-ja" This dataset is licensed under CC BY SA 3.0 Last Update : 2023-05-28 databricks-dolly-15k-ja-gozaru kunishou/databricks-dolly-15k-ja https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja

  4. databricks-dolly-15k.jsonl

    • kaggle.com
    zip
    Updated Aug 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beyza Coban (2024). databricks-dolly-15k.jsonl [Dataset]. https://www.kaggle.com/datasets/beyzacoban/databricks-dolly-15k-jsonl/code
    Explore at:
    zip(9407597 bytes)Available download formats
    Dataset updated
    Aug 3, 2024
    Authors
    Beyza Coban
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Beyza Coban

    Released under Apache 2.0

    Contents

  5. Databricks-dolly-15k

    • opendatalab.com
    zip
    Updated Apr 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Databricks, Inc. (2023). Databricks-dolly-15k [Dataset]. https://opendatalab.com/OpenDataLab/Databricks-dolly-15k
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 27, 2023
    Dataset provided by
    Databrickshttp://databricks.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Databricks-dolly-15k is a corpus of over 15,000 records generated by thousands of Databricks employees, enabling large language models to demonstrate the amazing interactivity of ChatGPT. Databricks employees were invited to create prompt/response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended, free-form category. Instruct contributors to refrain from using information from any source on the web, except Wikipedia (for a specific subset of command categories), and explicitly instruct contributors to avoid using generative AI in formulating commands or responses. Examples of each behavior are provided to motivate the question types and instructions appropriate to each category.

  6. Dolly 15k Dutch

    • zenodo.org
    • huggingface.co
    • +1more
    bin
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bram Vanroy; Bram Vanroy (2023). Dolly 15k Dutch [Dataset]. http://doi.org/10.57967/hf/0785
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bram Vanroy; Bram Vanroy
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This dataset contains 14,934 instructions, contexts and responses, in several natural language categories such as classification, closed QA, generation, etc. The English original dataset was created by @databricks, who crowd-sourced the data creation via its employees. The current dataset is a translation of that dataset through ChatGPT (gpt-3.5-turbo).

    Data Instances

    {
     "id": 14963,
     "instruction": "Wat zijn de duurste steden ter wereld?",
     "context": "",
     "response": "Dit is een uitgebreide lijst van de duurste steden: Singapore, Tel Aviv, New York, Hong Kong, Los Angeles, Zurich, Genève, San Francisco, Parijs en Sydney.",
     "category": "brainstorming"
    }
    

    Data Fields

    • id: the ID of the item. The following 77 IDs are not included because they could not be translated (or were too long): [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966]
    • instruction: the instruction (question)
    • context: additional context that the AI can use to answer the question
    • response: the AI's expected response
    • category: the category of this type of question (see Dolly for more info)

    Dataset Creation

    Both the translations and the topics were translated with OpenAI's API for gpt-3.5-turbo. max_tokens=1024, temperature=0 as parameters.

    The prompt template to translate the input is (where src_lang was English and tgt_lang Dutch):

    CONVERSATION_TRANSLATION_PROMPT = """You are asked to translate a task's instruction, optional context to the task, and the response to the task, from {src_lang} to {tgt_lang}.
    
    Here are the requirements that you should adhere to:
    1. maintain the format: the task consists of a task instruction (marked `instruction: `), optional context to the task (marked `context: `) and response for the task marked with `response: `;
    2. do not translate the identifiers `instruction: `, `context: `, and `response: ` but instead copy them to your output;
    3. make sure that text is fluent to read and does not contain grammatical errors. Use standard {tgt_lang} without regional bias;
    4. translate the instruction and context text using informal, but standard, language;
    5. make sure to avoid biases (such as gender bias, grammatical bias, social bias);
    6. if the instruction is to correct grammar mistakes or spelling mistakes then you have to generate a similar mistake in the context in {tgt_lang}, and then also generate a corrected output version in the output in {tgt_lang};
    7. if the instruction is to translate text from one language to another, then you do not translate the text that needs to be translated in the instruction or the context, nor the translation in the response (just copy them as-is);
    8. do not translate code fragments but copy them to your output. If there are English examples, variable names or definitions in code fragments, keep them in English.
    
    Now translate the following task with the requirements set out above. Do not provide an explanation and do not add anything else.
    
    """
    

    The system message was:

    You are a helpful assistant that translates English to Dutch according to the requirements that are given to you.
    

    Note that 77 items (0.5%) were not successfully translated. This can either mean that the prompt was too long for the given limit (max_tokens=1024) or that the generated translation could not be parsed into instruction, context and response fields. The missing IDs are [1502, 1812, 1868, 4179, 4541, 6347, 8851, 9321, 10588, 10835, 11257, 12082, 12319, 12471, 12701, 12988, 13066, 13074, 13076, 13181, 13253, 13279, 13313, 13346, 13369, 13446, 13475, 13528, 13546, 13548, 13549, 13558, 13566, 13600, 13603, 13657, 13668, 13733, 13765, 13775, 13801, 13831, 13906, 13922, 13923, 13957, 13967, 13976, 14028, 14031, 14045, 14050, 14082, 14083, 14089, 14110, 14155, 14162, 14181, 14187, 14200, 14221, 14222, 14281, 14473, 14475, 14476, 14587, 14590, 14667, 14685, 14764, 14780, 14808, 14836, 14891, 1 4966].

    Initial Data Collection and Normalization

    Initial data collection by databricks. See their repository for more information about this dataset.

    Considerations for Using the Data

    Note that the translations in this new dataset have not been verified by humans! Use at your own risk, both in terms of quality and biases.

    Discussion of Biases

    As with any machine-generated texts, users should be aware of potential biases that are included in this dataset. Although the prompt specifically includes make sure to avoid biases (such as gender bias, grammatical bias, social bias), of course the impact of such command is not known. It is likely that biases remain in the dataset so use with caution.

    Other Known Limitations

    The translation quality has not been verified. Use at your own risk!

    Licensing Information

    This repository follows the original databricks license, which is CC BY-SA 3.0 but see below for a specific restriction.

    This text was generated (either in part or in full) with GPT-3 (gpt-3.5-turbo), OpenAI’s large-scale language-generation model. Upon generating draft language, the author reviewed, edited, and revised the language to their own liking and takes ultimate responsibility for the content of this publication.

    If you use this dataset, you must also follow the Sharing and Usage policies.

    As clearly stated in their Terms of Use, specifically 2c.iii, "[you may not] use output from the Services to develop models that compete with OpenAI". That means that you cannot use this dataset to build models that are intended to commercially compete with OpenAI. As far as I am aware, that is a specific restriction that should serve as an addendum to the current license.

    This dataset is also available on the Hugging Face hub, its canonical repository.

  7. O

    databricks-dolly-15k-curated-en

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). databricks-dolly-15k-curated-en [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-curated-en
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Description

    In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. The aim of the project is to correct the instructions, intput and responses to make sure they are of the highest quality and that they match the task category that they belong to. All three texts should be clear and include real information. In addition, the response should be as complete but concise as possible.

  8. h

    databricks-dolly-15k-single-text

    • huggingface.co
    Updated May 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassin Elsir (2024). databricks-dolly-15k-single-text [Dataset]. https://huggingface.co/datasets/rislemy/databricks-dolly-15k-single-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2024
    Authors
    Yassin Elsir
    Description

    rislemy/databricks-dolly-15k-single-text dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    databricks-dolly-15k-chatml

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Re:cast AI, databricks-dolly-15k-chatml [Dataset]. https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Re:cast AI
    Description

    Dataset Card for "databricks-dolly-15k-chatml"

      Dataset Summary
    

    This dataset has been created by Re:cast AI to transform the existing dataset databricks/databricks-dolly-15k into a chatml friendly format for use in SFT tasks with pretrained models.

      Dataset Structure
    

    messages = [ { "content": "You are an expert Q&A system that is trusted around the world. You always... etc.", "role": "system" }, { "content": "(Optional) Context information is… See the full description on the dataset page: https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml.

  10. h

    databricks-dolly-15k-alpaca

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    databricks-dolly-15k-alpaca [Dataset]. https://huggingface.co/datasets/kamrr/databricks-dolly-15k-alpaca
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Kareem Amr
    Description

    kamrr/databricks-dolly-15k-alpaca dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    databricks-dolly-15k-tamil

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wijesundara, databricks-dolly-15k-tamil [Dataset]. https://huggingface.co/datasets/Suchinthana/databricks-dolly-15k-tamil
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Wijesundara
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Suchinthana/databricks-dolly-15k-tamil dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    databricks-dolly-15k-sinhala

    • huggingface.co
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    databricks-dolly-15k-sinhala [Dataset]. https://huggingface.co/datasets/Suchinthana/databricks-dolly-15k-sinhala
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2023
    Authors
    Wijesundara
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for "databricks-dolly-15k-sinhala"

    More Information needed

  13. P

    SurgeGlobal/LaMini Dataset

    • paperswithcode.com
    Updated Apr 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chandeepa Dissanayake; Lahiru Lowe; Sachith Gunasekara; Yasiru Ratnayake (2024). SurgeGlobal/LaMini Dataset [Dataset]. https://paperswithcode.com/dataset/surgeglobal-lamini
    Explore at:
    Dataset updated
    Apr 17, 2024
    Authors
    Chandeepa Dissanayake; Lahiru Lowe; Sachith Gunasekara; Yasiru Ratnayake
    Description

    Overview The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

    Dataset Generation

    Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. Seed Instructions: Sourced from databricks/databricks-dolly-15k dataset. Generation Approach: Example-guided and topic-guided strategies. Total Instructions: 1,504 unique instruction examples.

    Dataset Sources

    Repository: Bitbucket Project Paper : Pre-Print

    Structure Each entry in the dataset contains: - Instruction - Response

    Usage The LaMini Dataset can be used to fine-tune language models to improve their ability to follow instructions and generate relevant responses.

    Access The dataset is available on HuggingFace at the following link: https://huggingface.co/datasets/SurgeGlobal/LaMini

    Citation If you find our work useful, please cite our paper as follows: @misc{surge2024openbezoar, title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake}, year={2024}, eprint={2404.12195}, archivePrefix={arXiv}, primaryClass={cs.CL} }

    Dataset Authors Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

  14. h

    lilac-databricks-dolly-15k-curated-en

    • huggingface.co
    Updated Aug 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lilac AI (2023). lilac-databricks-dolly-15k-curated-en [Dataset]. https://huggingface.co/datasets/lilacai/lilac-databricks-dolly-15k-curated-en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2023
    Dataset authored and provided by
    Lilac AI
    Description

    lilac/databricks-dolly-15k-curated-en

    This dataset is a Lilac processed dataset. Original dataset: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en To download the dataset to a local directory: lilac download lilacai/lilac-databricks-dolly-15k-curated-en

    or from python with: ll.download("lilacai/lilac-databricks-dolly-15k-curated-en")

  15. h

    databricks-dolly-15k-modernbert-split-kmeans-dim768-20250130

    • huggingface.co
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Chu (2025). databricks-dolly-15k-modernbert-split-kmeans-dim768-20250130 [Dataset]. https://huggingface.co/datasets/rchu233/databricks-dolly-15k-modernbert-split-kmeans-dim768-20250130
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2025
    Authors
    Robert Chu
    Description

    rchu233/databricks-dolly-15k-modernbert-split-kmeans-dim768-20250130 dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    databricks-dolly-15k-modernbert-train-kmeans-dim768-20250316

    • huggingface.co
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Namburi GNVV Satya Sai Srinath (2025). databricks-dolly-15k-modernbert-train-kmeans-dim768-20250316 [Dataset]. https://huggingface.co/datasets/NamburiSrinath/databricks-dolly-15k-modernbert-train-kmeans-dim768-20250316
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2025
    Authors
    Namburi GNVV Satya Sai Srinath
    Description

    NamburiSrinath/databricks-dolly-15k-modernbert-train-kmeans-dim768-20250316 dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    dolly-15k

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ritesh Khanna, dolly-15k [Dataset]. https://huggingface.co/datasets/treadon/dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Ritesh Khanna
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Dataset Card for "dolly-15k"

      Summary
    

    This is the dataset supplied by Databricks for training Dolly V2. This set is split 99% training / 1% validation, should you want to set aside some records for evaluation purposes.

      Special thanks to ❤️ Databricks for creating and making this set available.
    

    More Information needed

  18. h

    databricks-dolly-15k-ja_cool

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WarriorMama777, databricks-dolly-15k-ja_cool [Dataset]. https://huggingface.co/datasets/WarriorMama777/databricks-dolly-15k-ja_cool
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    WarriorMama777
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Overview

    This dataset is edited from kunishou/databricks-dolly-15k-en.It was edited so that it would be like Yuki Nagato, who appears in "The Melancholy of Haruhi Suzumiya," with an emotionless and indifferent way of speaking.In more detail, I used VS CODE etc. to replace "です、ます" and "だ、である", etc.
    It's a dataset for my hobby, but feel free to use it.

      Links… See the full description on the dataset page: https://huggingface.co/datasets/WarriorMama777/databricks-dolly-15k-ja_cool.
    
  19. h

    dolly-15k-hippo-translated-pt-12k

    • huggingface.co
    Updated Sep 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dolly-15k-hippo-translated-pt-12k [Dataset]. https://huggingface.co/datasets/Gustrd/dolly-15k-hippo-translated-pt-12k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2023
    Authors
    Gustavo Rocha Dias
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary databricks-dolly-15k ( https://huggingface.co/datasets/databricks/databricks-dolly-15k/ ) is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This translation into Portuguese was executed utilizing a technique from the HIPPO benchmark. By… See the full description on the dataset page: https://huggingface.co/datasets/Gustrd/dolly-15k-hippo-translated-pt-12k.

  20. h

    ko.databricks-dolly-15k

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kim Ki Hyun, ko.databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/nlp-with-deeplearning/ko.databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Kim Ki Hyun
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    원본 데이터셋: databricks/databricks-dolly-15k

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Databricks (2023). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
Organization logo

databricks-dolly-15k

databricks/databricks-dolly-15k

Explore at:
150 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 17, 2023
Dataset authored and provided by
Databrickshttp://databricks.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

Search
Clear search
Close search
Google apps
Main menu