89 datasets found

databricks-dolly-15k
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Databrickshttp://databricks.com/
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.
h
databricks-dolly-15k
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Squared, Inc., databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/aisquared/databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
AI Squared, Inc.
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
databricks-dolly-15k

This dataset was not originally created by AI Squared. This dataset was curated and created by Databricks. The below text comes from the original release of the dataset's README file in GitHub (available at https://github.com/databrickslabs/dolly/tree/master/data):

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in… See the full description on the dataset page: https://huggingface.co/datasets/aisquared/databricks-dolly-15k.
O
databricks-dolly-15k-ja
opendatalab.com
zip
Updated Jan 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). databricks-dolly-15k-ja [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-ja
Explore at:
zipAvailable download formats
Dataset updated
Jan 9, 2024
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset was created by automatically translating "databricks-dolly-15k" into Japanese.
h
databricks-dolly-15k-ja
huggingface.co
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LLM-jp (2024). databricks-dolly-15k-ja [Dataset]. https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2024
Dataset authored and provided by
LLM-jp
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
databricks-dolly-15k-ja

This repository provides an instruction tuning dataset developed by LLM-jp, a collaborative project launched in Japan. This dataset is a Japanese translation of databricks-dolly-15k using DeepL.

Send Questions to

llm-jp(at)nii.ac.jp

Model Card Authors

The names are listed in alphabetical order. Hirokazu Kiyomaru, Hiroshi Matsuda, Jun Suzuki, Namgi Han, Saku Sugawara, Shota Sasaki, Shuhei Kurita, Taishi Nakamura, Takashi Kodama, Takumi… See the full description on the dataset page: https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja.
h
databricks-dolly-15k-ko
huggingface.co
Updated Apr 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NLP & AI - Korea University (2023). databricks-dolly-15k-ko [Dataset]. https://huggingface.co/datasets/nlpai-lab/databricks-dolly-15k-ko
Explore at:
Dataset updated
Apr 12, 2023
Dataset authored and provided by
NLP & AI - Korea University
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Korean translation of databricks-dolly-15k via the DeepL API Note: There are cases where multilingual data has been converted to monolingual data during batch translation to Korean using the API. Below is databricks-dolly-15k's README.

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification… See the full description on the dataset page: https://huggingface.co/datasets/nlpai-lab/databricks-dolly-15k-ko.
h
databricks-dolly-15k
huggingface.co
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Post-training-Data-Flywheel (2024). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Post-training-Data-Flywheel/databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2024
Dataset authored and provided by
Post-training-Data-Flywheel
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Post-training-Data-Flywheel/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
O
databricks-dolly-15k-curated-en
opendatalab.com
huggingface.co
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). databricks-dolly-15k-curated-en [Dataset]. https://opendatalab.com/OpenDataLab/databricks-dolly-15k-curated-en
Explore at:
zipAvailable download formats
Dataset updated
Jun 1, 2023
Description
In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. The aim of the project is to correct the instructions, intput and responses to make sure they are of the highest quality and that they match the task category that they belong to. All three texts should be clear and include real information. In addition, the response should be as complete but concise as possible.
h
databricks-dolly-15k-curated-multilingual
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argilla, databricks-dolly-15k-curated-multilingual [Dataset]. https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Argilla
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for "databricks-dolly-15k-curated-multilingual"

A curated and multilingual version of the Databricks Dolly instructions dataset. It includes a programmatically and manually corrected version of the original en dataset. See below. STATUS: Currently, the original Dolly v2 English version has been curated combining automatic processing and collaborative human curation using Argilla (~400 records have been manually edited and fixed). The following graph shows a summary… See the full description on the dataset page: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-multilingual.
h
databricks-dolly-15k
huggingface.co
Updated Feb 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharath S Hebbar (2024). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Sharathhebbar24/databricks-dolly-15k
Explore at:
Dataset updated
Feb 4, 2024
Authors
Sharath S Hebbar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Databricks-dolly

This is a cleansed version of databricks/databricks-dolly-15k

Usage

from datasets import load_dataset dataset = load_dataset("Sharathhebbar24/databricks-dolly-15k", split="train")
h
ChatML-databricks-dolly-15k
huggingface.co
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Nogueira (2024). ChatML-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2024
Authors
Victor Nogueira
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
databricks/databricks-dolly-15k in ChatML format. Python code used for conversion: from datasets import load_dataset import pandas from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained( pretrained_model_name_or_path="Felladrin/Llama-160M-Chat-v1" )

dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

def format(columns): instruction = columns["instruction"].strip() context = columns["context"].strip() response =… See the full description on the dataset page: https://huggingface.co/datasets/Felladrin/ChatML-databricks-dolly-15k.
h
databricks-dolly-15k-curated-es
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
María Grandury, databricks-dolly-15k-curated-es [Dataset]. https://huggingface.co/datasets/mariagrandury/databricks-dolly-15k-curated-es
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
María Grandury
Description
Dataset Card for databricks-dolly-15k-curated-es

This dataset has been created with Argilla. As shown in the sections below, this dataset can be loaded into Argilla as explained in Load with Argilla, or used directly with the datasets library in Load with datasets.

Dataset Summary

This dataset contains:

A dataset configuration file conforming to the Argilla dataset format named argilla.cfg. This configuration file will be used to configure the dataset when using the… See the full description on the dataset page: https://huggingface.co/datasets/mariagrandury/databricks-dolly-15k-curated-es.
h
databricks-dolly-15k
huggingface.co
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaibhav Adlakha (2024). databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/vaibhavad/databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 19, 2024
Authors
Vaibhav Adlakha
Description
vaibhavad/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
databricks-databricks-dolly-15k
huggingface.co
Updated Sep 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AGIE AI Technology (2024). databricks-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/agie-ai/databricks-databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2024
Dataset authored and provided by
AGIE AI Technology
Description
Dataset Card for "databricks-databricks-dolly-15k"

More Information needed
h
thai_databricks_dolly
huggingface.co
Updated Jun 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SEACrowd (2024). thai_databricks_dolly [Dataset]. https://huggingface.co/datasets/SEACrowd/thai_databricks_dolly
Explore at:
Dataset updated
Jun 20, 2024
Dataset authored and provided by
SEACrowd
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This is a Thai-instructed dataset translated from databricks-dolly-15k using Google Cloud Translation. databricks-dolly-15k is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.
h
pretrain-databricks-dolly-15k
huggingface.co
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Nogueira (2024). pretrain-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/Felladrin/pretrain-databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2024
Authors
Victor Nogueira
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Conversion of databricks/databricks-dolly-15k dataset to be used in pretraining. Python code used for conversion: from datasets import load_dataset import pandas

dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

def format(columns): instruction = columns["instruction"].strip() answer = columns["response"].strip() return f"{instruction}

{answer}"pandas.DataFrame({"text": [format(columns) for columns in dataset]}).to_csv("train.csv", index=False)
databricks-dolly-15k-chatml
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Re:cast AI, databricks-dolly-15k-chatml [Dataset]. https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
CAST AI Group, Inc.
Authors
Re:cast AI
Description
Dataset Card for "databricks-dolly-15k-chatml"

Dataset Summary

This dataset has been created by Re:cast AI to transform the existing dataset databricks/databricks-dolly-15k into a chatml friendly format for use in SFT tasks with pretrained models.

Dataset Structure

messages = [ { "content": "You are an expert Q&A system that is trusted around the world. You always... etc.", "role": "system" }, { "content": "(Optional) Context information is… See the full description on the dataset page: https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml.
h
instruct-rl-databricks-dolly-15k
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Valenzuela (2024). instruct-rl-databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/marcov/instruct-rl-databricks-dolly-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Marco Valenzuela
Description
marcov/instruct-rl-databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
databricks-dolly-15k-ja-annotated
huggingface.co
Updated Feb 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
System K Dev. (2025). databricks-dolly-15k-ja-annotated [Dataset]. https://huggingface.co/datasets/systemk/databricks-dolly-15k-ja-annotated
Explore at:
Dataset updated
Feb 5, 2025
Dataset authored and provided by
System K Dev.
Description
systemk/databricks-dolly-15k-ja-annotated dataset hosted on Hugging Face and contributed by the HF Datasets community
h
databricks-dolly-15k-llama
huggingface.co
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David (2024). databricks-dolly-15k-llama [Dataset]. https://huggingface.co/datasets/daveokpare/databricks-dolly-15k-llama
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
David
Description
Dataset Card for "databricks-dolly-15k-llama"

More Information needed
h
databricks-dolly-15k-single-text
huggingface.co
Updated May 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassin Elsir (2024). databricks-dolly-15k-single-text [Dataset]. https://huggingface.co/datasets/rislemy/databricks-dolly-15k-single-text
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2024
Authors
Yassin Elsir
Description
rislemy/databricks-dolly-15k-single-text dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Databricks, databricks-dolly-15k [Dataset]. https://huggingface.co/datasets/databricks/databricks-dolly-15k

databricks-dolly-15k

databricks/databricks-dolly-15k

Explore at:

200 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset authored and provided by

Databrickshttp://databricks.com/

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Summary

databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.

Clear search

Close search

Google apps

Main menu

databricks-dolly-15k

databricks-dolly-15k

databricks-dolly-15k-ja

databricks-dolly-15k-ja

databricks-dolly-15k-ko

databricks-dolly-15k

databricks-dolly-15k-curated-en

databricks-dolly-15k-curated-multilingual

databricks-dolly-15k

ChatML-databricks-dolly-15k

databricks-dolly-15k-curated-es

databricks-dolly-15k

databricks-databricks-dolly-15k

thai_databricks_dolly

pretrain-databricks-dolly-15k

databricks-dolly-15k-chatml

instruct-rl-databricks-dolly-15k

databricks-dolly-15k-ja-annotated

databricks-dolly-15k-llama

databricks-dolly-15k-single-text

databricks-dolly-15kSee More Versions

databricks/databricks-dolly-15k

databricks-dolly-15k