4 datasets found

h
TinyStories
huggingface.co
opendatalab.com
Updated May 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronen Eldan (2023). TinyStories [Dataset]. https://huggingface.co/datasets/roneneldan/TinyStories
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Authors
Ronen Eldan
License
https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/
Description
Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary. Described in the following paper: https://arxiv.org/abs/2305.07759. The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources: tinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/roneneldan/TinyStories.
h
TinyStories-MRL
huggingface.co
Updated Jul 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reactive AI (2025). TinyStories-MRL [Dataset]. https://huggingface.co/datasets/ReactiveAI/TinyStories-MRL
Explore at:
Dataset updated
Jul 29, 2025
Dataset authored and provided by
Reactive AI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for ReactiveAI/TinyStories-MRL

Synthetic Memory Reinforcement Learning dataset for Proof-of-Concept Reactive Transformer models. Dataset is divided into subsets, used in different Curriculum Stage of MRL training - each subset have different number of follow-up interactions, could use different strategy, and have train and validation splits.

After first experiments with MRL, we decided to abandon single step and two steps stages. That's because with single step… See the full description on the dataset page: https://huggingface.co/datasets/ReactiveAI/TinyStories-MRL.
h
tale-frame
huggingface.co
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
guodaosun (2024). tale-frame [Dataset]. https://huggingface.co/datasets/guodaosun/tale-frame
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 18, 2024
Authors
guodaosun
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
TinyStories Dataset README

Overview

This dataset is based on TinyStories and includes structured JSON data with corresponding annotations, designed for research in controllable story generation and related tasks.

Dataset Structure

Each data item contains the following fields:

1. conversations

Type: List Purpose: Contains the JSON of the story from: Always set to "human". value: Structured data containing entities, events, story structures and… See the full description on the dataset page: https://huggingface.co/datasets/guodaosun/tale-frame.
h
GradedStories
huggingface.co
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Badr (2024). GradedStories [Dataset]. https://huggingface.co/datasets/AB057/GradedStories
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2024
Authors
Ahmed Badr
License
https://choosealicense.com/licenses/llama3/https://choosealicense.com/licenses/llama3/
Description
GradedStories

GradedStories is a synthetically-augmented dataset, created by evaluating the quality of 2.7M children stories from the TinyStories dataset. The evaluation process was done using Llama-3-8B-Instruct. The dataset includes the original stories, the generated evaluations & an assigned grade (out of 10) for each story’s structure as well as a grade for the story's adherence to common sense reasoning. The added information about each sample’s potential quality (evaluations… See the full description on the dataset page: https://huggingface.co/datasets/AB057/GradedStories.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ronen Eldan (2023). TinyStories [Dataset]. https://huggingface.co/datasets/roneneldan/TinyStories

TinyStories

roneneldan/TinyStories

Explore at:

383 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 16, 2023

Authors

Ronen Eldan

License

https://choosealicense.com/licenses/cdla-sharing-1.0/https://choosealicense.com/licenses/cdla-sharing-1.0/

Description

Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary. Described in the following paper: https://arxiv.org/abs/2305.07759. The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M. Additional resources: tinystories_all_data.tar.gz - contains a superset of… See the full description on the dataset page: https://huggingface.co/datasets/roneneldan/TinyStories.

Clear search

Close search

Google apps

Main menu

TinyStories

TinyStories-MRL

tale-frame

GradedStories

TinyStoriesSee More Versions

roneneldan/TinyStories

TinyStories