3 datasets found

h
test
huggingface.co
Updated Sep 27, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DigitalBrainLab (2016). test [Dataset]. https://huggingface.co/datasets/DBL/test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2016
Dataset authored and provided by
DigitalBrainLab
Description
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers -… See the full description on the dataset page: https://huggingface.co/datasets/DBL/test.
P
WikiText-2 Dataset
paperswithcode.com
opendatalab.com
+1more
Updated Dec 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher (2023). WikiText-2 Dataset [Dataset]. https://paperswithcode.com/dataset/wikitext-2
Explore at:
Dataset updated
Dec 13, 2023
Authors
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher
Description
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.
P
WikiText-103 Dataset
paperswithcode.com
opendatalab.com
Updated Oct 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher (2016). WikiText-103 Dataset [Dataset]. https://paperswithcode.com/dataset/wikitext-103
Explore at:
Dataset updated
Oct 2, 2016
Authors
Stephen Merity; Caiming Xiong; James Bradbury; Richard Socher
Description
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License.

Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers - all of which are removed in PTB. As it is composed of full articles, the dataset is well suited for models that can take advantage of long term dependencies.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

DigitalBrainLab (2016). test [Dataset]. https://huggingface.co/datasets/DBL/test

test

DBL/test

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 27, 2016

Dataset authored and provided by

DigitalBrainLab

Description

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger vocabulary and retains the original case, punctuation and numbers -… See the full description on the dataset page: https://huggingface.co/datasets/DBL/test.

Clear search

Close search

Google apps

Main menu

test

WikiText-2 Dataset

WikiText-103 Dataset

test

DBL/test