86 datasets found
  1. databricks-dolly-15k

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Databrickshttp://databricks.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

  2. h

    databricks-dolly-15k

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Squared, Inc., databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/aisquared/databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    AI Squared, Inc.
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    databricks-dolly-15k

    This dataset was not originally created by AI Squared. This dataset was curated and created by Databricks. The below text comes from the original release of the dataset's README file in GitHub (available at https://github.com/databrickslabs/dolly/tree/master/data):

      Summary
    

    databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in… See the full description on the dataset page: https://huggingface.co/datasets/aisquared/databricks-dolly-15k.

  3. h

    databricks-dolly-15k-ko

    • huggingface.co
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLP & AI - Korea University (2023). databricks-dolly-15k-ko [Dataset]. https://huggingface.co/datasets/nlpai-lab/databricks-dolly-15k-ko
    Explore at:
    Dataset updated
    Apr 12, 2023
    Dataset authored and provided by
    NLP & AI - Korea University
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Korean translation of databricks-dolly-15k via the DeepL API Note: There are cases where multilingual data has been converted to monolingual data during batch translation to Korean using the API. Below is databricks-dolly-15k's README.

      Summary
    

    databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification… See the full description on the dataset page: https://huggingface.co/datasets/nlpai-lab/databricks-dolly-15k-ko.

  4. h

    databricks-databricks-dolly-15k

    • huggingface.co
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AGIE AI Technology (2024). databricks-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/agie-ai/databricks-databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2024
    Dataset authored and provided by
    AGIE AI Technology
    Description

    Dataset Card for "databricks-databricks-dolly-15k"

    More Information needed

  5. h

    databricks-dolly-15k-ja

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LLM-jp (2024). databricks-dolly-15k-ja [Dataset]. https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Dataset authored and provided by
    LLM-jp
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    databricks-dolly-15k-ja

    This repository provides an instruction tuning dataset developed by LLM-jp, a collaborative project launched in Japan. This dataset is a Japanese translation of databricks-dolly-15k using DeepL.

      Send Questions to
    

    llm-jp(at)nii.ac.jp

      Model Card Authors
    

    The names are listed in alphabetical order. Hirokazu Kiyomaru, Hiroshi Matsuda, Jun Suzuki, Namgi Han, Saku Sugawara, Shota Sasaki, Shuhei Kurita, Taishi Nakamura, Takashi Kodama, Takumi… See the full description on the dataset page: https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja.

  6. O

    databricks-dolly-15k-curated-en

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). databricks-dolly-15k-curated-en [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-curated-en
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Description

    In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. The aim of the project is to correct the instructions, intput and responses to make sure they are of the highest quality and that they match the task category that they belong to. All three texts should be clear and include real information. In addition, the response should be as complete but concise as possible.

  7. h

    databricks-dolly-15k-curated-multilingual

    • huggingface.co
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla (2023). databricks-dolly-15k-curated-multilingual [Dataset]. https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2023
    Dataset authored and provided by
    Argilla
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for "databricks-dolly-15k-curated-multilingual"

    A curated and multilingual version of the Databricks Dolly instructions dataset. It includes a programmatically and manually corrected version of the original en dataset. See below. STATUS: Currently, the original Dolly v2 English version has been curated combining automatic processing and collaborative human curation using Argilla (~400 records have been manually edited and fixed). The following graph shows a summary… See the full description on the dataset page: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual.

  8. h

    pretrain-databricks-dolly-15k

    • huggingface.co
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Nogueira (2024). pretrain-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Felladrin/pretrain-databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 24, 2024
    Authors
    Victor Nogueira
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Conversion of databricks/databricks-dolly-15k dataset to be used in pretraining. Python code used for conversion: from datasets import load_dataset import pandas

    dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

    def format(columns): instruction = columns["instruction"].strip() answer = columns["response"].strip() return f"{instruction}

    {answer}"pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_csv("train.csv", index=False)

  9. h

    ChatML-databricks-dolly-15k

    • huggingface.co
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victor Nogueira (2024). ChatML-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2024
    Authors
    Victor Nogueira
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    databricks/databricks-dolly-15k in ChatML format. Python code used for conversion: from datasets import load_dataset import pandas from transformers import AutoTokenizer

    tokenizer = AutoTokenizer.from_pretrained( pretrained_model_name_or_path="Felladrin/Llama-160M-Chat-v1" )

    dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

    def format(columns): instruction = columns["instruction"].strip() context = columns["context"].strip() response =… See the full description on the dataset page: https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k.

  10. h

    databricks-dolly-15k

    • huggingface.co
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Post-training-Data-Flywheel (2024). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Post-training-Data-Flywheel/databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    Post-training-Data-Flywheel
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Post-training-Data-Flywheel/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. databricks-dolly-15k-chatml

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Re:cast AI, databricks-dolly-15k-chatml [Dataset]. https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    CAST AI Group, Inc.
    Authors
    Re:cast AI
    Description

    Dataset Card for "databricks-dolly-15k-chatml"

      Dataset Summary
    

    This dataset has been created by Re:cast AI to transform the existing dataset databricks/databricks-dolly-15k into a chatml friendly format for use in SFT tasks with pretrained models.

      Dataset Structure
    

    messages = [ { "content": "You are an expert Q&A system that is trusted around the world. You always... etc.", "role": "system" }, { "content": "(Optional) Context information is… See the full description on the dataset page: https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml.

  12. h

    databricks-dolly-15k-curated-es

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    María Grandury, databricks-dolly-15k-curated-es [Dataset]. https://huggingface.co/datasets/mariagrandury/databricks-dolly-15k-curated-es
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    María Grandury
    Description

    Dataset Card for databricks-dolly-15k-curated-es

    This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

      Dataset Summary
    

    This dataset contains:

    A dataset configuration file conforming to the Argilla dataset format named argilla.cfg. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/mariagrandury/databricks-dolly-15k-curated-es.

  13. h

    databricks-dolly-15k-nyan

    • huggingface.co
    Updated Jul 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takashi Okamoto (2025). databricks-dolly-15k-nyan [Dataset]. https://huggingface.co/datasets/okamototk/databricks-dolly-15k-nyan
    Explore at:
    Dataset updated
    Jul 21, 2025
    Authors
    Takashi Okamoto
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    okamototk/databricks-dolly-15k-nyan dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    thai_databricks_dolly

    • huggingface.co
    Updated Jun 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SEACrowd (2024). thai_databricks_dolly [Dataset]. https://huggingface.co/datasets/SEACrowd/thai_databricks_dolly
    Explore at:
    Dataset updated
    Jun 20, 2024
    Dataset authored and provided by
    SEACrowd
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This is a Thai-instructed dataset translated from databricks-dolly-15k using Google Cloud Translation. databricks-dolly-15k is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

  15. h

    databricks-dolly-15k

    • huggingface.co
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vaibhav Adlakha (2024). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/vaibhavad/databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 19, 2024
    Authors
    Vaibhav Adlakha
    Description

    vaibhavad/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    instruct-rl-databricks-dolly-15k

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Valenzuela (2024). instruct-rl-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/marcov/instruct-rl-databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Marco Valenzuela
    Description

    marcov/instruct-rl-databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    databricks-dolly-15k-Hinglish-Codemix

    • huggingface.co
    Updated Aug 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankit Pal (2017). databricks-dolly-15k-Hinglish-Codemix [Dataset]. http://doi.org/10.57967/hf/1677
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 24, 2017
    Authors
    Ankit Pal
    Description

    Summary

    aaditya/databricks-dolly-15k-Hindi is an open source Hinglish-Codemix version dataset of databricks/databricks-dolly-15k. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License. Supported Tasks:

    Training LLMs Synthetic Data Generation Data Augmentation

    Languages: Hindi Version: 1.0 Original Dataset repo… See the full description on the dataset page: https://huggingface.co/datasets/aaditya/databricks-dolly-15k-Hinglish-Codemix.

  18. h

    databricks-dolly-15k-llama

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David (2024). databricks-dolly-15k-llama [Dataset]. https://huggingface.co/datasets/daveokpare/databricks-dolly-15k-llama
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    David
    Description

    Dataset Card for "databricks-dolly-15k-llama"

    More Information needed

  19. h

    databricks-dolly-15k-single-text

    • huggingface.co
    Updated May 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassin Elsir (2024). databricks-dolly-15k-single-text [Dataset]. https://huggingface.co/datasets/rislemy/databricks-dolly-15k-single-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2024
    Authors
    Yassin Elsir
    Description

    rislemy/databricks-dolly-15k-single-text dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    databricks-dolly-15k-alpaca-style

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Liu, databricks-dolly-15k-alpaca-style [Dataset]. https://huggingface.co/datasets/sam-liu-lmi/databricks-dolly-15k-alpaca-style
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Samuel Liu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    sam-liu-lmi/databricks-dolly-15k-alpaca-style dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
Organization logo

databricks-dolly-15k

databricks/databricks-dolly-15k

Explore at:
192 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Databrickshttp://databricks.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

Search
Clear search
Close search
Google apps
Main menu