Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Summary
databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was created by automatically translating "databricks-dolly-15k" into Japanese.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Summary
databricks-dolly-15k-cleanset can be used to produced CLEANed up versions of the popular databricks-dolly-15k dataSET, which was used to fine-tune the Dolly 2.0. The original databricks-dolly-15k contains 15,000 human-annotated instruction-response pairs covering various categories. However, there are many low-quality responses, incomplete/vague prompts, and other problematic text lurking in the dataset (as with for all real-world instruction tuning datasets). We ran… See the full description on the dataset page: https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleanset.
Facebook
TwitterThis dataset was created by Prathamesh Sonawane
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Beyza Coban
Released under Apache 2.0
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
databricks-dolly-15k-ja
This repository provides an instruction tuning dataset developed by LLM-jp, a collaborative project launched in Japan. This dataset is a Japanese translation of databricks-dolly-15k using DeepL.
Send Questions to
llm-jp(at)nii.ac.jp
Model Card Authors
The names are listed in alphabetical order. Hirokazu Kiyomaru, Hiroshi Matsuda, Jun Suzuki, Namgi Han, Saku Sugawara, Shota Sasaki, Shuhei Kurita, Taishi Nakamura, Takashi Kodama, Takumi… See the full description on the dataset page: https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Caleb Klinger
Released under CC0: Public Domain
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
databricks-dolly-15k is a corpus of more than 15,000 records generated by thousands of Databricks employees to enable large language models to exhibit the magical interactivity of ChatGPT. Databricks employees were invited to create prompt / response pairs in each of eight different instruction categories, including the seven outlined in the InstructGPT paper, as well as an open-ended free-form category. The contributors were instructed to avoid using information from any source on the web with the exception of Wikipedia (for particular subsets of instruction categories), and explicitly instructed to avoid using generative AI in formulating instructions or responses. Examples of each behavior were provided to motivate the types of questions and instructions appropriate to each category. Halfway through the data generation process, contributors were given the option of answering questions posed by other contributors. They were asked to rephrase the original question and only select questions they could be reasonably expected to answer correctly. For certain categories contributors were asked to provide reference texts copied from Wikipedia. Reference text (indicated by the context field in the actual dataset) may contain bracketed Wikipedia citation numbers (e.g. [42]) which we recommend users remove for downstream applications.
Facebook
TwitterGuidelines
In this dataset, you will find a collection of records that show a category, an instruction, a context and a response to that instruction. The aim of the project is to correct the instructions, intput and responses to make sure they are of the highest quality and that they match the task category that they belong to. All three texts should be clear and include real information. In addition, the response should be as complete but concise as possible. To curate the dataset… See the full description on the dataset page: https://huggingface.co/datasets/argilla/databricks-dolly-15k-curated-en.
Facebook
TwitterDataset Card for "databricks-databricks-dolly-15k"
More Information needed
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This is a Non-IID split version of databricks/databricks-dolly-15k.
Facebook
Twittermarcov/instruct-rl-databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterKorea-MES/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Card for "databricks-dolly-15k-chatml"
Dataset Summary
This dataset has been created by Re:cast AI to transform the existing dataset databricks/databricks-dolly-15k into a chatml friendly format for use in SFT tasks with pretrained models.
Dataset Structure
messages = [ { "content": "You are an expert Q&A system that is trusted around the world. You always... etc.", "role": "system" }, { "content": "(Optional) Context information is… See the full description on the dataset page: https://huggingface.co/datasets/recastai/databricks-dolly-15k-chatml.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Summary
This is a Thai 🇹🇭-instructed dataset translated from databricks-dolly-15k using Google Cloud Translation. databricks-dolly-15k is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic… See the full description on the dataset page: https://huggingface.co/datasets/Thaweewat/databricks-dolly-15k-th.
Facebook
TwitterDataset Card for "databricks-dolly-15k-llama"
More Information needed
Facebook
Twitterandrewbai/databricks-dolly-15k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittervsevolodl/databricks-dolly-15k-sum dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
edition_0016_argilla-databricks-dolly-15k-curated-en-readymade
A Readymade by TheFactoryX
Original Dataset
argilla/databricks-dolly-15k-curated-en
Process
This dataset is a "readymade" - inspired by Marcel Duchamp's concept of taking everyday objects and recontextualizing them as art. What we did:
Selected the original dataset from Hugging Face Shuffled each column independently Destroyed all row-wise relationships Preserved structure, removed meaning
The… See the full description on the dataset page: https://huggingface.co/datasets/TheFactoryX/edition_0016_argilla-databricks-dolly-15k-curated-en-readymade.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
This is a checkpoint of the databricks-dolly-15k dataset
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Summary
databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. This dataset can be used for any purpose, whether academic or commercial, under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported… See the full description on the dataset page: https://huggingface.co/datasets/databricks/databricks-dolly-15k.