2 datasets found
  1. h

    Test-Prompt

    • huggingface.co
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Lim (2024). Test-Prompt [Dataset]. https://huggingface.co/datasets/Eric-Valyu/Test-Prompt
    Explore at:
    Dataset updated
    Jul 26, 2024
    Authors
    Eric Lim
    License

    https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

    Description

    MAP-CC

    🌐 Homepage | πŸ€— MAP-CC | πŸ€— CHC-Bench | πŸ€— CT-LLM | πŸ“– arXiv | GitHub An open-source Chinese pretraining dataset with a scale of 800 billion tokens, offering the NLP community high-quality Chinese pretraining data.

      Disclaimer
    

    This model, developed for academic purposes, employs rigorously compliance-checked training data to uphold the highest standards of integrity and compliance. Despite our efforts, the inherent complexities of data and the broad… See the full description on the dataset page: https://huggingface.co/datasets/Eric-Valyu/Test-Prompt.

  2. h

    MAP-CC

    • huggingface.co
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Multimodal Art Projection (2024). MAP-CC [Dataset]. https://huggingface.co/datasets/m-a-p/MAP-CC
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 5, 2024
    Dataset authored and provided by
    Multimodal Art Projection
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    MAP-CC

    🌐 Homepage | πŸ€— MAP-CC | πŸ€— CHC-Bench | πŸ€— CT-LLM | πŸ“– arXiv | GitHub An open-source Chinese pretraining dataset with a scale of 800 billion tokens, offering the NLP community high-quality Chinese pretraining data.

      Disclaimer
    

    This model, developed for academic purposes, employs rigorously compliance-checked training data to uphold the highest standards of integrity and compliance. Despite our efforts, the inherent complexities of data and the broad spectrum of… See the full description on the dataset page: https://huggingface.co/datasets/m-a-p/MAP-CC.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Eric Lim (2024). Test-Prompt [Dataset]. https://huggingface.co/datasets/Eric-Valyu/Test-Prompt

Test-Prompt

Eric-Valyu/Test-Prompt

Explore at:
Dataset updated
Jul 26, 2024
Authors
Eric Lim
License

https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/

Description

MAP-CC

🌐 Homepage | πŸ€— MAP-CC | πŸ€— CHC-Bench | πŸ€— CT-LLM | πŸ“– arXiv | GitHub An open-source Chinese pretraining dataset with a scale of 800 billion tokens, offering the NLP community high-quality Chinese pretraining data.

  Disclaimer

This model, developed for academic purposes, employs rigorously compliance-checked training data to uphold the highest standards of integrity and compliance. Despite our efforts, the inherent complexities of data and the broad… See the full description on the dataset page: https://huggingface.co/datasets/Eric-Valyu/Test-Prompt.

Search
Clear search
Close search
Google apps
Main menu