71 datasets found
  1. databricks-dolly-15k

    • huggingface.co
    • kaggle.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Databrickshttp://databricks.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

  2. O

    databricks-dolly-15k-ja

    • opendatalab.com
    zip
    Updated Jan 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). databricks-dolly-15k-ja [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-ja
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 9, 2024
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset was created by automatically translating "databricks-dolly-15k" into Japanese.

  3. h

    databricks-dolly-15k-cleanset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cleanlab, databricks-dolly-15k-cleanset [Dataset]. https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleanset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Cleanlab
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    databricks-dolly-15k-cleanset can be used to produced CLEANed up versions of the popular databricks-dolly-15k dataSET, which was used to fine-tune the Dolly 2.0. The original databricks-dolly-15k contains 15,000 human-annotated instruction-response pairs covering various categories. However, there are many low-quality responses, incomplete/vague prompts, and other problematic text lurking in the dataset (as with for all real-world instruction tuning datasets). We ran… See the full description on the dataset page: https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleanset.

  4. databricks-dolly-15k dataset

    • kaggle.com
    zip
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prathamesh Sonawane (2023). databricks-dolly-15k dataset [Dataset]. https://www.kaggle.com/datasets/pratt3000/databricks-dolly-15k-dataset
    Explore at:
    zip(4738358 bytes)Available download formats
    Dataset updated
    Apr 12, 2023
    Authors
    Prathamesh Sonawane
    Description

    Dataset

    This dataset was created by Prathamesh Sonawane

    Contents

  5. databricks-dolly-15k.jsonl

    • kaggle.com
    zip
    Updated Aug 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beyza Coban (2024). databricks-dolly-15k.jsonl [Dataset]. https://www.kaggle.com/datasets/beyzacoban/databricks-dolly-15k-jsonl
    Explore at:
    zip(9407597 bytes)Available download formats
    Dataset updated
    Aug 3, 2024
    Authors
    Beyza Coban
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Beyza Coban

    Released under Apache 2.0

    Contents

  6. h

    databricks-dolly-15k-ja

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LLM-jp (2024). databricks-dolly-15k-ja [Dataset]. https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Dataset authored and provided by
    LLM-jp
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    databricks-dolly-15k-ja

    This repository provides an instruction tuning dataset developed by LLM-jp, a collaborative project launched in Japan. This dataset is a Japanese translation of databricks-dolly-15k using DeepL.

      Send Questions to
    

    llm-jp(at)nii.ac.jp

      Model Card Authors
    

    The names are listed in alphabetical order. Hirokazu Kiyomaru, Hiroshi Matsuda, Jun Suzuki, Namgi Han, Saku Sugawara, Shota Sasaki, Shuhei Kurita, Taishi Nakamura, Takashi Kodama, Takumi… See the full description on the dataset page: https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja.

  7. databricks dolly 15k use

    • kaggle.com
    zip
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caleb Klinger (2024). databricks dolly 15k use [Dataset]. https://www.kaggle.com/datasets/trippinglettuce/databricks-dolly-15k-use/code
    Explore at:
    zip(4734011 bytes)Available download formats
    Dataset updated
    Mar 19, 2024
    Authors
    Caleb Klinger
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Caleb Klinger

    Released under CC0: Public Domain

    Contents

  8. O

    databricks-dolly-15k-ja-reformat-v1

    • opendatalab.com
    • huggingface.co
    zip
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). databricks-dolly-15k-ja-reformat-v1 [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-ja-reformat-v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 13, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    databricks-dolly-15k is a corpus of more than 15,000 records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended free-form category. The contributors were instructed to avoid using information from any source on the web with the exception of Wikipedia (for particular subsets of instruction categories), and explicitly instructed to avoid using generative AI in formulating instructions or responses. Examples of each behavior were provided to motivate the types of questions and instructions appropriate to each category. Halfway through the data generation process, contributors were given the option of answering questions posed by other contributors. They were asked to rephrase the original question and only select questions they could be reasonably expected to answer correctly. For certain categories contributors were asked to provide reference texts copied from Wikipedia. Reference text (indicated by the context field in the actual dataset) may contain bracketed Wikipedia citation numbers (e.g. [42]) which we recommend users remove for downstream applications.

  9. h

    databricks-dolly-15k-curated-en

    • huggingface.co
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argilla, databricks-dolly-15k-curated-en [Dataset]. https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Argilla
    Description

    Guidelines

    In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. The aim of the project is to correct the instructions, intput and responses to make sure they are of the highest quality and that they match the task category that they belong to. All three texts should be clear and include real information. In addition, the response should be as complete but concise as possible. To curate the dataset… See the full description on the dataset page: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en.

  10. h

    databricks-databricks-dolly-15k

    • huggingface.co
    Updated Sep 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AGIE AI Technology (2024). databricks-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/agie-ai/databricks-databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2024
    Dataset authored and provided by
    AGIE AI Technology
    Description

    Dataset Card for "databricks-databricks-dolly-15k"

    More Information needed

  11. h

    databricks-dolly-15k-niid

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FEDML, databricks-dolly-15k-niid [Dataset]. https://huggingface.co/datasets/fedml/databricks-dolly-15k-niid
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    FEDML
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This is a Non-IID split version of databricks/databricks-dolly-15k.

  12. h

    instruct-rl-databricks-dolly-15k

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Valenzuela (2024). instruct-rl-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/marcov/instruct-rl-databricks-dolly-15k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    Marco Valenzuela
    Description

    marcov/instruct-rl-databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    databricks-dolly-15k

    • huggingface.co
    Updated Jul 15, 2009
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Korea University Mobile Embedded Systems Lab (2009). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Korea-MES/databricks-dolly-15k
    Explore at:
    Dataset updated
    Jul 15, 2009
    Dataset authored and provided by
    Korea University Mobile Embedded Systems Lab
    Description

    Korea-MES/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. databricks-dolly-15k-chatml

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Re:cast AI, databricks-dolly-15k-chatml [Dataset]. https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    CAST AI Group, Inc.
    Authors
    Re:cast AI
    Description

    Dataset Card for "databricks-dolly-15k-chatml"

      Dataset Summary
    

    This dataset has been created by Re:cast AI to transform the existing dataset databricks/databricks-dolly-15k into a chatml friendly format for use in SFT tasks with pretrained models.

      Dataset Structure
    

    messages = [ { "content": "You are an expert Q&A system that is trusted around the world. You always... etc.", "role": "system" }, { "content": "(Optional) Context information is… See the full description on the dataset page: https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml.

  15. h

    databricks-dolly-15k-th

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thaweewat, databricks-dolly-15k-th [Dataset]. https://huggingface.co/datasets/Thaweewat/databricks-dolly-15k-th
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Thaweewat
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Summary

    This is a Thai 🇹🇭-instructed dataset translated from databricks-dolly-15k using Google Cloud Translation. databricks-dolly-15k is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic… See the full description on the dataset page: https://huggingface.co/datasets/Thaweewat/databricks-dolly-15k-th.

  16. h

    databricks-dolly-15k-llama

    • huggingface.co
    Updated Sep 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David (2024). databricks-dolly-15k-llama [Dataset]. https://huggingface.co/datasets/daveokpare/databricks-dolly-15k-llama
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2024
    Authors
    David
    Description

    Dataset Card for "databricks-dolly-15k-llama"

    More Information needed

  17. h

    databricks-dolly-15k

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Bai, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/andrewbai/databricks-dolly-15k
    Explore at:
    Authors
    Andrew Bai
    Description

    andrewbai/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    databricks-dolly-15k-sum

    • huggingface.co
    Updated Jun 29, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seva Leonov (2017). databricks-dolly-15k-sum [Dataset]. https://huggingface.co/datasets/vsevolodl/databricks-dolly-15k-sum
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2017
    Authors
    Seva Leonov
    Description

    vsevolodl/databricks-dolly-15k-sum dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    edition_0016_argilla-databricks-dolly-15k-curated-en-readymade

    • huggingface.co
    Updated Dec 2, 2009
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheFactoryX (2009). edition_0016_argilla-databricks-dolly-15k-curated-en-readymade [Dataset]. https://huggingface.co/datasets/TheFactoryX/edition_0016_argilla-databricks-dolly-15k-curated-en-readymade
    Explore at:
    Dataset updated
    Dec 2, 2009
    Dataset authored and provided by
    TheFactoryX
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    edition_0016_argilla-databricks-dolly-15k-curated-en-readymade

    A Readymade by TheFactoryX

      Original Dataset
    

    argilla/databricks-dolly-15k-curated-en

      Process
    

    This dataset is a "readymade" - inspired by Marcel Duchamp's concept of taking everyday objects and recontextualizing them as art. What we did:

    Selected the original dataset from Hugging Face Shuffled each column independently Destroyed all row-wise relationships Preserved structure, removed meaning

    The… See the full description on the dataset page: https://huggingface.co/datasets/TheFactoryX/edition_0016_argilla-databricks-dolly-15k-curated-en-readymade.

  20. h

    databricks-dolly-15k-en

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umar Zein Ardhi Siregar, databricks-dolly-15k-en [Dataset]. https://huggingface.co/datasets/umarzein/databricks-dolly-15k-en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Umar Zein Ardhi Siregar
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    This is a checkpoint of the databricks-dolly-15k dataset

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
Organization logo

databricks-dolly-15k

databricks/databricks-dolly-15k

Explore at:
247 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Databrickshttp://databricks.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

Search
Clear search
Close search
Google apps
Main menu