10 datasets found
  1. proof-pile-2

    • huggingface.co
    Updated Oct 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EleutherAI (2023). proof-pile-2 [Dataset]. https://huggingface.co/datasets/EleutherAI/proof-pile-2
    Explore at:
    Dataset updated
    Oct 17, 2023
    Dataset authored and provided by
    EleutherAIhttps://eleuther.ai/
    Description

    A dataset of high quality mathematical text.

  2. h

    proof-pile-2-streaming

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xavier Durawa, proof-pile-2-streaming [Dataset]. https://huggingface.co/datasets/xavierdurawa/proof-pile-2-streaming
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Xavier Durawa
    Description

    ArXiv | Models | Data | Code | Blog | Sample Explorer Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck The Proof-Pile-2 is a 55 billion token dataset of mathematical and scientific documents. This dataset was created in order to train the Llemma 7B and Llemma 34B models. It consists of three subsets:

    arxiv (29B tokens): the ArXiv subset of RedPajama open-web-math (15B tokens): The OpenWebMath… See the full description on the dataset page: https://huggingface.co/datasets/xavierdurawa/proof-pile-2-streaming.

  3. h

    proof-pile-2

    • huggingface.co
    Updated Jun 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duong Hoang Le (2025). proof-pile-2 [Dataset]. https://huggingface.co/datasets/lehduong/proof-pile-2
    Explore at:
    Dataset updated
    Jun 21, 2025
    Authors
    Duong Hoang Le
    Description

    lehduong/proof-pile-2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. t

    Yunfan Shao, Linyang Li, Zhaoye Fei, Hang Yan, Dahua Lin, Xipeng Qiu (2024)....

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Yunfan Shao, Linyang Li, Zhaoye Fei, Hang Yan, Dahua Lin, Xipeng Qiu (2024). Dataset: Proof-Pile-2. https://doi.org/10.57702/7iiqekx3 [Dataset]. https://service.tib.eu/ldmservice/dataset/proof-pile-2
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting.

  5. Proof-Pile-2

    • opendatalab.com
    zip
    Updated Oct 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton University (2023). Proof-Pile-2 [Dataset]. https://opendatalab.com/OpenDataLab/Proof-Pile-2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2023
    Dataset provided by
    EleutherAIhttps://eleuther.ai/
    University of Toronto
    Princeton University
    Description

    Proof-Pile II, a 55 billion token dataset of mathematical and scientific documents.

  6. h

    extracted-proof-pile-2

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JilinHu, extracted-proof-pile-2 [Dataset]. https://huggingface.co/datasets/JilinHu/extracted-proof-pile-2
    Explore at:
    Authors
    JilinHu
    Description

    This dataset is extracted from proof-pile-2. It only contains the Isabelle and coq files that could be used in pre-training.

  7. h

    algebraic-stack

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sparverius, algebraic-stack [Dataset]. https://huggingface.co/datasets/typeof/algebraic-stack
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    sparverius
    Description

    NOTE: Please see EleutherAI/proof-pile-2

    This is a cherry-picked repackaging of the algebraic-stack segment from the proof-pile-2 dataset as parquet files

      License
    

    see EleutherAI/proof-pile-2

      Citation
    

    see EleutherAI/proof-pile-2

  8. e

    WFS XPlanung BPL „Front pile (2-ро изменение)“

    • data.europa.eu
    wfs
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WFS XPlanung BPL „Front pile (2-ро изменение)“ [Dataset]. https://data.europa.eu/data/datasets/f3d2a304-ed2d-453e-9dd9-bafa80234a95?locale=bg
    Explore at:
    wfsAvailable download formats
    Description

    WFS услуга от плана за развитие „Vordere Halde (2-ра поправка)“ на община Holzmaden от XPlanung 5.0. Описание: Предна купчина (2-ро изменение); Използване: WA; Текст променя покривните конструкции. WFS услуга от плана за развитие „Vordere Halde (2-ра поправка)“ на община Holzmaden от XPlanung 5.0. Описание: Предна купчина (2-ро изменение); Използване: WA; Текст променя покривните конструкции.

  9. e

    Podatkovni niz INSPIRE BPL „Front Pile (2. amandma)“

    • data.europa.eu
    wfs, wms
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Podatkovni niz INSPIRE BPL „Front Pile (2. amandma)“ [Dataset]. https://data.europa.eu/data/datasets/6790157f-1229-4471-a787-e22dee1dc682?locale=sl
    Explore at:
    wfs, wmsAvailable download formats
    Description

    Po INSPIRE preoblikovanem razvojnem načrtu „Front Halde (2. sprememba)“ občine Holzmaden na podlagi nabora podatkov XPlanung v različici 5.0.

  10. e

    030403-237 Næsgrd. - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). 030403-237 Næsgrd. - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/e431853e-3ea3-550e-89e7-d0380110b6ca
    Explore at:
    Dataset updated
    Oct 31, 2023
    Description

    This record describes ancient sites and monuments as well archaeological excavations undertaken by Danish museums. Excerpt of the Danish description of events: 1956 : Sandet Jord. Større Boplads med meget Flintaffald (5275 stk. optalt) og mange Redskaber fra Dolktid: 3 Brudst. af Dolke, 2 Brudst. af Flintsegle, 2 Forarbejder til Dolke el. Segle, 4 tofligede Pile, 2 trekantede Pile uden Modhager, 6 Skaftskrabere, 3 fladeh. Flintstykker, 1 Økseplanke, 1 sleben Flis, 1 svær Flækkekniv, 6 Skiveskrabere, 2 Flækkeskrabere med udbuet Æg, 1 Spaanskraber med indbuet Æg, 5 Ildsten, 7 Spaanbor, 2 Pile-Forarbejder, 6 Tværpile, de 3 med udsvajet Æg, 4 Flækkeknive.1956 : Sandet Jord. Større Boplads med meget Flintaffald (5275 stk. optalt) og mange Redskaber fra Dolktid: 3 Brudst. af Dolke, 2 Brudst. af Flintsegle, 2 Forarbejder til Dolke el. Segle, 4 tofligede Pile, 2 trekantede Pile uden Modhager, 6 Skaftskrabere, 3 fladeh. Flintstykker, 1 Økseplanke, 1 sleben Flis, 1 svær Flækkekniv, 6 Skiveskrabere, 2 Flækkeskrabere med udbuet Æg, 1 Spaanskraber med indbuet Æg, 5 Ildsten, 7 Spaanbor, 2 Pile-Forarbejder, 6 Tværpile, de 3 med udsvajet Æg, 4 Flækkeknive.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
EleutherAI (2023). proof-pile-2 [Dataset]. https://huggingface.co/datasets/EleutherAI/proof-pile-2
Organization logo

proof-pile-2

EleutherAI/proof-pile-2

Explore at:
86 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 17, 2023
Dataset authored and provided by
EleutherAIhttps://eleuther.ai/
Description

A dataset of high quality mathematical text.

Search
Clear search
Close search
Google apps
Main menu