71 datasets found

databricks-dolly-15k
huggingface.co
kaggle.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Databrickshttp://databricks.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.
O
databricks-dolly-15k-ja
opendatalab.com
zip
Updated Jan 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). databricks-dolly-15k-ja [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-ja
Explore at:
zipAvailable download formats
Dataset updated
Jan 9, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset was created by automatically translating "databricks-dolly-15k" into Japanese.
h
databricks-dolly-15k-cleanset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cleanlab, databricks-dolly-15k-cleanset [Dataset]. https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleanset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Cleanlab
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Summary

databricks-dolly-15k-cleanset can be used to produced CLEANed up versions of the popular databricks-dolly-15k dataSET, which was used to fine-tune the Dolly 2.0. The original databricks-dolly-15k contains 15,000 human-annotated instruction-response pairs covering various categories. However, there are many low-quality responses, incomplete/vague prompts, and other problematic text lurking in the dataset (as with for all real-world instruction tuning datasets). We ran… See the full description on the dataset page: https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleanset.
databricks-dolly-15k dataset
kaggle.com
zip
Updated Apr 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prathamesh Sonawane (2023). databricks-dolly-15k dataset [Dataset]. https://www.kaggle.com/datasets/pratt3000/databricks-dolly-15k-dataset
Explore at:
zip(4738358 bytes)Available download formats
Dataset updated
Apr 12, 2023
Authors
Prathamesh Sonawane
Description
Dataset

This dataset was created by Prathamesh Sonawane

Contents
databricks-dolly-15k.jsonl
kaggle.com
zip
Updated Aug 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beyza Coban (2024). databricks-dolly-15k.jsonl [Dataset]. https://www.kaggle.com/datasets/beyzacoban/databricks-dolly-15k-jsonl
Explore at:
zip(9407597 bytes)Available download formats
Dataset updated
Aug 3, 2024
Authors
Beyza Coban
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Beyza Coban

Released under Apache 2.0

Contents
h
databricks-dolly-15k-ja
huggingface.co
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LLM-jp (2024). databricks-dolly-15k-ja [Dataset]. https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2024
Dataset authored and provided by
LLM-jp
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
databricks-dolly-15k-ja

This repository provides an instruction tuning dataset developed by LLM-jp, a collaborative project launched in Japan. This dataset is a Japanese translation of databricks-dolly-15k using DeepL.

Send Questions to

llm-jp(at)nii.ac.jp

Model Card Authors

The names are listed in alphabetical order. Hirokazu Kiyomaru, Hiroshi Matsuda, Jun Suzuki, Namgi Han, Saku Sugawara, Shota Sasaki, Shuhei Kurita, Taishi Nakamura, Takashi Kodama, Takumi… See the full description on the dataset page: https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja.
databricks dolly 15k use
kaggle.com
zip
Updated Mar 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caleb Klinger (2024). databricks dolly 15k use [Dataset]. https://www.kaggle.com/datasets/trippinglettuce/databricks-dolly-15k-use/code
Explore at:
zip(4734011 bytes)Available download formats
Dataset updated
Mar 19, 2024
Authors
Caleb Klinger
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Caleb Klinger

Released under CC0: Public Domain

Contents
O
databricks-dolly-15k-ja-reformat-v1
opendatalab.com
huggingface.co
zip
Updated Apr 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). databricks-dolly-15k-ja-reformat-v1 [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-ja-reformat-v1
Explore at:
zipAvailable download formats
Dataset updated
Apr 13, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
databricks-dolly-15k is a corpus of more than 15,000 records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended free-form category. The contributors were instructed to avoid using information from any source on the web with the exception of Wikipedia (for particular subsets of instruction categories), and explicitly instructed to avoid using generative AI in formulating instructions or responses. Examples of each behavior were provided to motivate the types of questions and instructions appropriate to each category. Halfway through the data generation process, contributors were given the option of answering questions posed by other contributors. They were asked to rephrase the original question and only select questions they could be reasonably expected to answer correctly. For certain categories contributors were asked to provide reference texts copied from Wikipedia. Reference text (indicated by the context field in the actual dataset) may contain bracketed Wikipedia citation numbers (e.g. [42]) which we recommend users remove for downstream applications.
h
databricks-dolly-15k-curated-en
huggingface.co
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla, databricks-dolly-15k-curated-en [Dataset]. https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Argilla
Description
Guidelines

In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. The aim of the project is to correct the instructions, intput and responses to make sure they are of the highest quality and that they match the task category that they belong to. All three texts should be clear and include real information. In addition, the response should be as complete but concise as possible. To curate the dataset… See the full description on the dataset page: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en.
h
databricks-databricks-dolly-15k
huggingface.co
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AGIE AI Technology (2024). databricks-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/agie-ai/databricks-databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Dataset authored and provided by
AGIE AI Technology
Description
Dataset Card for "databricks-databricks-dolly-15k"

More Information needed
h
databricks-dolly-15k-niid
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FEDML, databricks-dolly-15k-niid [Dataset]. https://huggingface.co/datasets/fedml/databricks-dolly-15k-niid
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
FEDML
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This is a Non-IID split version of databricks/databricks-dolly-15k.
h
instruct-rl-databricks-dolly-15k
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Valenzuela (2024). instruct-rl-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/marcov/instruct-rl-databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Marco Valenzuela
Description
marcov/instruct-rl-databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
databricks-dolly-15k
huggingface.co
Updated Jul 15, 2009
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Korea University Mobile Embedded Systems Lab (2009). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Korea-MES/databricks-dolly-15k
Explore at:
Dataset updated
Jul 15, 2009
Dataset authored and provided by
Korea University Mobile Embedded Systems Lab
Description
Korea-MES/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
databricks-dolly-15k-chatml
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Re:cast AI, databricks-dolly-15k-chatml [Dataset]. https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
CAST AI Group, Inc.
Authors
Re:cast AI
Description
Dataset Card for "databricks-dolly-15k-chatml"

Dataset Summary

This dataset has been created by Re:cast AI to transform the existing dataset databricks/databricks-dolly-15k into a chatml friendly format for use in SFT tasks with pretrained models.

Dataset Structure

messages = [ { "content": "You are an expert Q&A system that is trusted around the world. You always... etc.", "role": "system" }, { "content": "(Optional) Context information is… See the full description on the dataset page: https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml.
h
databricks-dolly-15k-th
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thaweewat, databricks-dolly-15k-th [Dataset]. https://huggingface.co/datasets/Thaweewat/databricks-dolly-15k-th
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Thaweewat
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Summary

This is a Thai 🇹🇭-instructed dataset translated from databricks-dolly-15k using Google Cloud Translation. databricks-dolly-15k is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic… See the full description on the dataset page: https://huggingface.co/datasets/Thaweewat/databricks-dolly-15k-th.
h
databricks-dolly-15k-llama
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David (2024). databricks-dolly-15k-llama [Dataset]. https://huggingface.co/datasets/daveokpare/databricks-dolly-15k-llama
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
David
Description
Dataset Card for "databricks-dolly-15k-llama"

More Information needed
h
databricks-dolly-15k
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Bai, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/andrewbai/databricks-dolly-15k
Explore at:
Authors
Andrew Bai
Description
andrewbai/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
databricks-dolly-15k-sum
huggingface.co
Updated Jun 29, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seva Leonov (2017). databricks-dolly-15k-sum [Dataset]. https://huggingface.co/datasets/vsevolodl/databricks-dolly-15k-sum
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2017
Authors
Seva Leonov
Description
vsevolodl/databricks-dolly-15k-sum dataset hosted on Hugging Face and contributed by the HF Datasets community
h
edition_0016_argilla-databricks-dolly-15k-curated-en-readymade
huggingface.co
Updated Dec 2, 2009
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TheFactoryX (2009). edition_0016_argilla-databricks-dolly-15k-curated-en-readymade [Dataset]. https://huggingface.co/datasets/TheFactoryX/edition_0016_argilla-databricks-dolly-15k-curated-en-readymade
Explore at:
Dataset updated
Dec 2, 2009
Dataset authored and provided by
TheFactoryX
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
edition_0016_argilla-databricks-dolly-15k-curated-en-readymade

A Readymade by TheFactoryX

Original Dataset

argilla/databricks-dolly-15k-curated-en

Process

This dataset is a "readymade" - inspired by Marcel Duchamp's concept of taking everyday objects and recontextualizing them as art. What we did:

Selected the original dataset from Hugging Face Shuffled each column independently Destroyed all row-wise relationships Preserved structure, removed meaning

The… See the full description on the dataset page: https://huggingface.co/datasets/TheFactoryX/edition_0016_argilla-databricks-dolly-15k-curated-en-readymade.
h
databricks-dolly-15k-en
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umar Zein Ardhi Siregar, databricks-dolly-15k-en [Dataset]. https://huggingface.co/datasets/umarzein/databricks-dolly-15k-en
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Umar Zein Ardhi Siregar
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This is a checkpoint of the databricks-dolly-15k dataset

Facebook

Twitter

Click to copy link

Link copied

Cite

Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k

databricks-dolly-15k

databricks/databricks-dolly-15k

Explore at:

247 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset authored and provided by

Databrickshttp://databricks.com/

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

Clear search

Close search

Google apps

Main menu

databricks-dolly-15k

databricks-dolly-15k-ja

databricks-dolly-15k-cleanset

databricks-dolly-15k dataset

Dataset

Contents

databricks-dolly-15k.jsonl

Dataset

Contents

databricks-dolly-15k-ja

databricks dolly 15k use

Dataset

Contents

databricks-dolly-15k-ja-reformat-v1

databricks-dolly-15k-curated-en

databricks-databricks-dolly-15k

databricks-dolly-15k-niid

instruct-rl-databricks-dolly-15k

databricks-dolly-15k

databricks-dolly-15k-chatml

databricks-dolly-15k-th

databricks-dolly-15k-llama

databricks-dolly-15k

databricks-dolly-15k-sum

edition_0016_argilla-databricks-dolly-15k-curated-en-readymade

databricks-dolly-15k-en

databricks-dolly-15kSee More Versions

databricks/databricks-dolly-15k

databricks-dolly-15k