8 datasets found

h
unnatural-instructions-full
huggingface.co
Updated Dec 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Romero (2022). unnatural-instructions-full [Dataset]. https://huggingface.co/datasets/mrm8488/unnatural-instructions-full
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 21, 2022
Authors
Manuel Romero
Description
Dataset Card for Unnatural Instructions (Full data)

This info comes from the Unnatural Instructions GitHub repo. Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model. See full details in the paper: "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor"

🗃️ Content

It contains the full 240,670 Unnatural Instructions (instruction-input-output triplets) examples. It was constructed by expanding the… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/unnatural-instructions-full.
h
unnatural-instructions
huggingface.co
Updated Dec 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Romero (2022). unnatural-instructions [Dataset]. https://huggingface.co/datasets/mrm8488/unnatural-instructions
Explore at:
Dataset updated
Dec 21, 2022
Authors
Manuel Romero
Description
Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model. See full details in the paper: "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor" (https://arxiv.org/abs/2212.09689)
T
unnatural_instructions
tensorflow.org
Updated Jan 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). unnatural_instructions [Dataset]. https://www.tensorflow.org/datasets/catalog/unnatural_instructions
Explore at:
Dataset updated
Jan 19, 2023
Description
Dataset described in the paper: Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor (2022). Contains sets of natural-language instructions, with optional constraints / LLM-generated reformulations.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('unnatural_instructions', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
unnaturalhermes-questions-100k
huggingface.co
Updated Dec 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Florenzano (2023). unnaturalhermes-questions-100k [Dataset]. https://huggingface.co/datasets/ericflo/unnaturalhermes-questions-100k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 18, 2023
Authors
Eric Florenzano
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Used the technique from Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor and Mixtral8x7B (Base Model) to generate this diverse, fully-synthetic, fully open-source set of 100,000 conversation starters. See also: unnaturalhermes-questions-30k, a distinct set of 30k examples just like this, if you want more training data.
OpenHermes
kaggle.com
huggingface.co
Updated Dec 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volodymyr Pivoshenko 🇺🇦 (2023). OpenHermes [Dataset]. https://www.kaggle.com/datasets/volodymyrpivoshenko/openhermes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Volodymyr Pivoshenko 🇺🇦
Description
OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including: - GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium - WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan - Airoboros GPT-4 (v1.0), by JonDurbin - Camel-AI's domain expert datasets, by the Camel-AI Team - CodeAlpaca, by Sahil2801 - GPT4-LLM and Unnatural Instructions, by Microsoft

Filtering included the removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more

The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.

References 1. https://huggingface.co/datasets/teknium/openhermes
o
Data from: The last sermon of Mr. Henry Smith sometime Master of Arts in...
llds.ling-phil.ox.ac.uk
Updated May 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henry Smith (2024). The last sermon of Mr. Henry Smith sometime Master of Arts in Christ-Church College in Oxford, & late minister in Sallop. With his earnest invitations to the Sacrament of the Lords Supper. And directions to young beginners that they may be fitted for that Holy Communion, and receive it with profit. 2. His holy and pious sayings in general, necessary for all persons. 3. Instructions for young people, exhorting them to obedience, and duty towards their parents. 4. The sad effects of disobedience, in the examples of many wicked and unnatural children, who ame [sic] to untimely ends. With prayers suitable to divers occasions, by the same author. Published for the instruction and benefit of all Christian people. Licensed and entred according to order. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A60421
Explore at:
Dataset updated
May 12, 2024
Authors
Henry Smith
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Oxford
Description
(:unav)...........................................
ChatQA-Training-Data
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Data Description

We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!

Other… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.
h
x-self-instruct-seed-32
huggingface.co
Updated May 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SambaNova Systems (2023). x-self-instruct-seed-32 [Dataset]. https://huggingface.co/datasets/sambanovasystems/x-self-instruct-seed-32
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2023
Dataset authored and provided by
SambaNova Systems
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for xOA22 - Multilingual Prompts from OpenAssistant

Dataset Summary

x-self-instruct-seed-32 consists of 32 prompts chosen out of the 252 prompts in the self-instruct-seed dataset from the Self-Instruct paper. These 32 prompts were filtered out according to the following criteria:

Should be natural in a chat setting Therefore, we filter out any prompts with "few-shot examples", as these are all instruction prompts that we consider unnatural in a chat setting… See the full description on the dataset page: https://huggingface.co/datasets/sambanovasystems/x-self-instruct-seed-32.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Manuel Romero (2022). unnatural-instructions-full [Dataset]. https://huggingface.co/datasets/mrm8488/unnatural-instructions-full

unnatural-instructions-full

mrm8488/unnatural-instructions-full

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 21, 2022

Authors

Manuel Romero

Description

Dataset Card for Unnatural Instructions (Full data)

This info comes from the Unnatural Instructions GitHub repo. Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model. See full details in the paper: "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor"

  🗃️ Content

It contains the full 240,670 Unnatural Instructions (instruction-input-output triplets) examples. It was constructed by expanding the… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/unnatural-instructions-full.

Clear search

Close search

Google apps

Main menu

unnatural-instructions-full

unnatural-instructions

unnatural_instructions

unnaturalhermes-questions-100k

OpenHermes

Data from: The last sermon of Mr. Henry Smith sometime Master of Arts in...

ChatQA-Training-Data

x-self-instruct-seed-32

unnatural-instructions-fullSee More Versions

mrm8488/unnatural-instructions-full

unnatural-instructions-full