A dataset of high quality mathematical text.
ArXiv | Models | Data | Code | Blog | Sample Explorer Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck The Proof-Pile-2 is a 55 billion token dataset of mathematical and scientific documents. This dataset was created in order to train the Llemma 7B and Llemma 34B models. It consists of three subsets:
arxiv (29B tokens): the ArXiv subset of RedPajama open-web-math (15B tokens): The OpenWebMath… See the full description on the dataset page: https://huggingface.co/datasets/xavierdurawa/proof-pile-2-streaming.
lehduong/proof-pile-2 dataset hosted on Hugging Face and contributed by the HF Datasets community
The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting.
Proof-Pile II, a 55 billion token dataset of mathematical and scientific documents.
This dataset is extracted from proof-pile-2. It only contains the Isabelle and coq files that could be used in pre-training.
NOTE: Please see EleutherAI/proof-pile-2
This is a cherry-picked repackaging of the algebraic-stack segment from the proof-pile-2 dataset as parquet files
License
see EleutherAI/proof-pile-2
Citation
see EleutherAI/proof-pile-2
WFS услуга от плана за развитие „Vordere Halde (2-ра поправка)“ на община Holzmaden от XPlanung 5.0. Описание: Предна купчина (2-ро изменение); Използване: WA; Текст променя покривните конструкции. WFS услуга от плана за развитие „Vordere Halde (2-ра поправка)“ на община Holzmaden от XPlanung 5.0. Описание: Предна купчина (2-ро изменение); Използване: WA; Текст променя покривните конструкции.
Po INSPIRE preoblikovanem razvojnem načrtu „Front Halde (2. sprememba)“ občine Holzmaden na podlagi nabora podatkov XPlanung v različici 5.0.
This record describes ancient sites and monuments as well archaeological excavations undertaken by Danish museums. Excerpt of the Danish description of events: 1956 : Sandet Jord. Større Boplads med meget Flintaffald (5275 stk. optalt) og mange Redskaber fra Dolktid: 3 Brudst. af Dolke, 2 Brudst. af Flintsegle, 2 Forarbejder til Dolke el. Segle, 4 tofligede Pile, 2 trekantede Pile uden Modhager, 6 Skaftskrabere, 3 fladeh. Flintstykker, 1 Økseplanke, 1 sleben Flis, 1 svær Flækkekniv, 6 Skiveskrabere, 2 Flækkeskrabere med udbuet Æg, 1 Spaanskraber med indbuet Æg, 5 Ildsten, 7 Spaanbor, 2 Pile-Forarbejder, 6 Tværpile, de 3 med udsvajet Æg, 4 Flækkeknive.1956 : Sandet Jord. Større Boplads med meget Flintaffald (5275 stk. optalt) og mange Redskaber fra Dolktid: 3 Brudst. af Dolke, 2 Brudst. af Flintsegle, 2 Forarbejder til Dolke el. Segle, 4 tofligede Pile, 2 trekantede Pile uden Modhager, 6 Skaftskrabere, 3 fladeh. Flintstykker, 1 Økseplanke, 1 sleben Flis, 1 svær Flækkekniv, 6 Skiveskrabere, 2 Flækkeskrabere med udbuet Æg, 1 Spaanskraber med indbuet Æg, 5 Ildsten, 7 Spaanbor, 2 Pile-Forarbejder, 6 Tværpile, de 3 med udsvajet Æg, 4 Flækkeknive.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
A dataset of high quality mathematical text.