2 datasets found
  1. Data from: cosmopedia

    • huggingface.co
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face Smol Models Research (2024). cosmopedia [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/cosmopedia
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face Smol Models Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Cosmopedia v0.1

    Image generated by DALL-E, the prompt was generated by Mixtral-8x7B-Instruct-v0.1
    

    Note: Cosmopedia v0.2 is available at smollm-corpus User: What do you think "Cosmopedia" could mean? Hint: in our case it's not related to cosmology.

    Mixtral-8x7B-Instruct-v0.1: A possible meaning for "Cosmopedia" could be an encyclopedia or collection of information about different cultures, societies, and topics from around the world, emphasizing diversity and global… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/cosmopedia.

  2. smollm-corpus

    • huggingface.co
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face Smol Models Research (2024). smollm-corpus [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face Smol Models Research
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    SmolLM-Corpus

    This dataset is a curated collection of high-quality educational and synthetic data designed for training small language models. You can find more details about the models trained on this dataset in our SmolLM blog post.

      Dataset subsets
    
    
    
    
    
      Cosmopedia v2
    

    Cosmopedia v2 is an enhanced version of Cosmopedia, the largest synthetic dataset for pre-training, consisting of over 39 million textbooks, blog posts, and stories generated by… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hugging Face Smol Models Research (2024). cosmopedia [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/cosmopedia
Organization logo

Data from: cosmopedia

HuggingFaceTB/cosmopedia

Related Article
Explore at:
34 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2024
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face Smol Models Research
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Cosmopedia v0.1

Image generated by DALL-E, the prompt was generated by Mixtral-8x7B-Instruct-v0.1

Note: Cosmopedia v0.2 is available at smollm-corpus User: What do you think "Cosmopedia" could mean? Hint: in our case it's not related to cosmology.

Mixtral-8x7B-Instruct-v0.1: A possible meaning for "Cosmopedia" could be an encyclopedia or collection of information about different cultures, societies, and topics from around the world, emphasizing diversity and global… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/cosmopedia.

Search
Clear search
Close search
Google apps
Main menu