6 datasets found
  1. h

    cncf-raw-data-for-llm-training

    • huggingface.co
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kubermatic (2024). cncf-raw-data-for-llm-training [Dataset]. https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2024
    Dataset authored and provided by
    Kubermatic
    Description

    CNCF Raw Data for LLM Training

      Description
    

    This dataset, named cncf-raw-data-for-llm-training, consists of markdown (MD) and PDF content extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. The data was collected by fetching MD and PDF files from different CNCF project repositories and converting them into JSON format. This dataset is intended as raw data for training large language models (LLMs). The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training.

  2. Real Estate Data For LLM Fine-Tuning

    • kaggle.com
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heba Mohamed (2025). Real Estate Data For LLM Fine-Tuning [Dataset]. https://www.kaggle.com/datasets/hebamo7amed/real-estate-data-for-llm-fine-tuning/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Heba Mohamed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Heba Mohamed

    Released under CC0: Public Domain

    Contents

  3. additional-train-data-for-llm-science-exam

    • kaggle.com
    Updated Oct 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhiqing fang (2023). additional-train-data-for-llm-science-exam [Dataset]. https://www.kaggle.com/zhiqingfang/additional-train-data-for-llm-science-exam/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    zhiqing fang
    Description

    Dataset

    This dataset was created by zhiqing fang

    Contents

  4. h

    test-data-for-llm

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shang wang, test-data-for-llm [Dataset]. https://huggingface.co/datasets/ss1997/test-data-for-llm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    shang wang
    License

    https://choosealicense.com/licenses/llama2/https://choosealicense.com/licenses/llama2/

    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

      Dataset Sources [optional]
    

    Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/ss1997/test-data-for-llm.

  5. H

    Replication Data for "LLM Survey Data in Theory Testing"

    • dataverse.harvard.edu
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2025). Replication Data for "LLM Survey Data in Theory Testing" [Dataset]. http://doi.org/10.7910/DVN/AWWRIM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Anonymous
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are three files in total: (1) the original dataset, which includes both human-generated and AI-generated data across five rounds; (2) the prompt file, detailing the instructions used to generate AI data; and (3) the analysis protocol for conducting PLS-SEM using SmartPLS. Prepared for journal submission and peer review.

  6. h

    real-estate-data-for-llm-fine-tuning

    • huggingface.co
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heba (2025). real-estate-data-for-llm-fine-tuning [Dataset]. http://doi.org/10.57967/hf/5361
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Heba
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    heba1998/real-estate-data-for-llm-fine-tuning dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kubermatic (2024). cncf-raw-data-for-llm-training [Dataset]. https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training

cncf-raw-data-for-llm-training

Kubermatic/cncf-raw-data-for-llm-training

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2024
Dataset authored and provided by
Kubermatic
Description

CNCF Raw Data for LLM Training

  Description

This dataset, named cncf-raw-data-for-llm-training, consists of markdown (MD) and PDF content extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. The data was collected by fetching MD and PDF files from different CNCF project repositories and converting them into JSON format. This dataset is intended as raw data for training large language models (LLMs). The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training.

Search
Clear search
Close search
Google apps
Main menu