8 datasets found
  1. h

    unnatural-instructions-full

    • huggingface.co
    Updated Dec 21, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Romero (2022). unnatural-instructions-full [Dataset]. https://huggingface.co/datasets/mrm8488/unnatural-instructions-full
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2022
    Authors
    Manuel Romero
    Description

    Dataset Card for Unnatural Instructions (Full data)

    This info comes from the Unnatural Instructions GitHub repo. Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model. See full details in the paper: "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor"

      🗃️ Content
    

    It contains the full 240,670 Unnatural Instructions (instruction-input-output triplets) examples. It was constructed by expanding the… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/unnatural-instructions-full.

  2. h

    unnatural-instructions

    • huggingface.co
    Updated Dec 21, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Romero (2022). unnatural-instructions [Dataset]. https://huggingface.co/datasets/mrm8488/unnatural-instructions
    Explore at:
    Dataset updated
    Dec 21, 2022
    Authors
    Manuel Romero
    Description

    Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model. See full details in the paper: "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor" (https://arxiv.org/abs/2212.09689)

  3. T

    unnatural_instructions

    • tensorflow.org
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). unnatural_instructions [Dataset]. https://www.tensorflow.org/datasets/catalog/unnatural_instructions
    Explore at:
    Dataset updated
    Jan 19, 2023
    Description

    Dataset described in the paper: Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor (2022). Contains sets of natural-language instructions, with optional constraints / LLM-generated reformulations.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('unnatural_instructions', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  4. h

    unnaturalhermes-questions-100k

    • huggingface.co
    Updated Dec 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Florenzano (2023). unnaturalhermes-questions-100k [Dataset]. https://huggingface.co/datasets/ericflo/unnaturalhermes-questions-100k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2023
    Authors
    Eric Florenzano
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Used the technique from Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor and Mixtral8x7B (Base Model) to generate this diverse, fully-synthetic, fully open-source set of 100,000 conversation starters. See also: unnaturalhermes-questions-30k, a distinct set of 30k examples just like this, if you want more training data.

  5. OpenHermes

    • kaggle.com
    • huggingface.co
    Updated Dec 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volodymyr Pivoshenko 🇺🇦 (2023). OpenHermes [Dataset]. https://www.kaggle.com/datasets/volodymyrpivoshenko/openhermes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Volodymyr Pivoshenko 🇺🇦
    Description

    OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including: - GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium - WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan - Airoboros GPT-4 (v1.0), by JonDurbin - Camel-AI's domain expert datasets, by the Camel-AI Team - CodeAlpaca, by Sahil2801 - GPT4-LLM and Unnatural Instructions, by Microsoft

    Filtering included the removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more

    The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.

    References 1. https://huggingface.co/datasets/teknium/openhermes

  6. o

    Data from: The last sermon of Mr. Henry Smith sometime Master of Arts in...

    • llds.ling-phil.ox.ac.uk
    Updated May 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henry Smith (2024). The last sermon of Mr. Henry Smith sometime Master of Arts in Christ-Church College in Oxford, & late minister in Sallop. With his earnest invitations to the Sacrament of the Lords Supper. And directions to young beginners that they may be fitted for that Holy Communion, and receive it with profit. 2. His holy and pious sayings in general, necessary for all persons. 3. Instructions for young people, exhorting them to obedience, and duty towards their parents. 4. The sad effects of disobedience, in the examples of many wicked and unnatural children, who ame [sic] to untimely ends. With prayers suitable to divers occasions, by the same author. Published for the instruction and benefit of all Christian people. Licensed and entred according to order. [Dataset]. https://llds.ling-phil.ox.ac.uk/llds/xmlui/handle/20.500.14106/A60421
    Explore at:
    Dataset updated
    May 12, 2024
    Authors
    Henry Smith
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Oxford
    Description

    (:unav)...........................................

  7. ChatQA-Training-Data

    • huggingface.co
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Data Description

    We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!

      Other… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.
    
  8. h

    x-self-instruct-seed-32

    • huggingface.co
    Updated May 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SambaNova Systems (2023). x-self-instruct-seed-32 [Dataset]. https://huggingface.co/datasets/sambanovasystems/x-self-instruct-seed-32
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2023
    Dataset authored and provided by
    SambaNova Systems
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for xOA22 - Multilingual Prompts from OpenAssistant

      Dataset Summary
    

    x-self-instruct-seed-32 consists of 32 prompts chosen out of the 252 prompts in the self-instruct-seed dataset from the Self-Instruct paper. These 32 prompts were filtered out according to the following criteria:

    Should be natural in a chat setting Therefore, we filter out any prompts with "few-shot examples", as these are all instruction prompts that we consider unnatural in a chat setting… See the full description on the dataset page: https://huggingface.co/datasets/sambanovasystems/x-self-instruct-seed-32.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Manuel Romero (2022). unnatural-instructions-full [Dataset]. https://huggingface.co/datasets/mrm8488/unnatural-instructions-full

unnatural-instructions-full

mrm8488/unnatural-instructions-full

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 21, 2022
Authors
Manuel Romero
Description

Dataset Card for Unnatural Instructions (Full data)

This info comes from the Unnatural Instructions GitHub repo. Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model. See full details in the paper: "Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor"

  🗃️ Content

It contains the full 240,670 Unnatural Instructions (instruction-input-output triplets) examples. It was constructed by expanding the… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/unnatural-instructions-full.

Search
Clear search
Close search
Google apps
Main menu