CNCF Raw Data for LLM Training
Description
This dataset, named cncf-raw-data-for-llm-training, consists of markdown (MD) and PDF content extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. The data was collected by fetching MD and PDF files from different CNCF project repositories and converting them into JSON format. This dataset is intended as raw data for training large language models (LLMs). The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Heba Mohamed
Released under CC0: Public Domain
This dataset was created by zhiqing fang
https://choosealicense.com/licenses/llama2/https://choosealicense.com/licenses/llama2/
Dataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]
Dataset Sources [optional]
Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/ss1997/test-data-for-llm.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
There are three files in total: (1) the original dataset, which includes both human-generated and AI-generated data across five rounds; (2) the prompt file, detailing the instructions used to generate AI data; and (3) the analysis protocol for conducting PLS-SEM using SmartPLS. Prepared for journal submission and peer review.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
heba1998/real-estate-data-for-llm-fine-tuning dataset hosted on Hugging Face and contributed by the HF Datasets community
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CNCF Raw Data for LLM Training
Description
This dataset, named cncf-raw-data-for-llm-training, consists of markdown (MD) and PDF content extracted from various project repositories within the CNCF (Cloud Native Computing Foundation) landscape. The data was collected by fetching MD and PDF files from different CNCF project repositories and converting them into JSON format. This dataset is intended as raw data for training large language models (LLMs). The dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/Kubermatic/cncf-raw-data-for-llm-training.