4 datasets found
  1. h

    TinyStories

    • huggingface.co
    • opendatalab.com
    Updated May 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronen Eldan (2023). TinyStories [Dataset]. https://huggingface.co/datasets/roneneldan/TinyStories
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2023
    Authors
    Ronen Eldan
    License

    https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

    Description

    Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary. Described in the following paper: https://arxiv.org/abs/2305.07759. The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources: tinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/roneneldan/TinyStories.

  2. h

    TinyStories-MRL

    • huggingface.co
    Updated Jul 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reactive AI (2025). TinyStories-MRL [Dataset]. https://huggingface.co/datasets/ReactiveAI/TinyStories-MRL
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset authored and provided by
    Reactive AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for ReactiveAI/TinyStories-MRL

    Synthetic Memory Reinforcement Learning dataset for Proof-of-Concept Reactive Transformer models. Dataset is divided into subsets, used in different Curriculum Stage of MRL training - each subset have different number of follow-up interactions, could use different strategy, and have train and validation splits.

    After first experiments with MRL, we decided to abandon single step and two steps stages. That's because with single step… See the full description on the dataset page: https://huggingface.co/datasets/ReactiveAI/TinyStories-MRL.

  3. h

    tale-frame

    • huggingface.co
    Updated Dec 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    guodaosun (2024). tale-frame [Dataset]. https://huggingface.co/datasets/guodaosun/tale-frame
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2024
    Authors
    guodaosun
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TinyStories Dataset README

      Overview
    

    This dataset is based on TinyStories and includes structured JSON data with corresponding annotations, designed for research in controllable story generation and related tasks.

      Dataset Structure
    

    Each data item contains the following fields:

      1. conversations
    

    Type: List Purpose: Contains the JSON of the story from: Always set to "human". value: Structured data containing entities, events, story structures and… See the full description on the dataset page: https://huggingface.co/datasets/guodaosun/tale-frame.

  4. h

    GradedStories

    • huggingface.co
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Badr (2024). GradedStories [Dataset]. https://huggingface.co/datasets/AB057/GradedStories
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2024
    Authors
    Ahmed Badr
    License

    https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/

    Description

    GradedStories

    GradedStories is a synthetically-augmented dataset, created by evaluating the quality of 2.7M children stories from the TinyStories dataset. The evaluation process was done using Llama-3-8B-Instruct. The dataset includes the original stories, the generated evaluations & an assigned grade (out of 10) for each story’s structure as well as a grade for the story's adherence to common sense reasoning. The added information about each sample’s potential quality (evaluations… See the full description on the dataset page: https://huggingface.co/datasets/AB057/GradedStories.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ronen Eldan (2023). TinyStories [Dataset]. https://huggingface.co/datasets/roneneldan/TinyStories

TinyStories

roneneldan/TinyStories

Explore at:
383 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Authors
Ronen Eldan
License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary. Described in the following paper: https://arxiv.org/abs/2305.07759. The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources: tinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/roneneldan/TinyStories.

Search
Clear search
Close search
Google apps
Main menu