100+ datasets found

policy-docs
huggingface.co
Updated Apr 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face (2024). policy-docs [Dataset]. https://huggingface.co/datasets/huggingface/policy-docs
Explore at:
Dataset updated
Apr 3, 2024
Dataset authored and provided by
Hugging Facehttps://huggingface.co/
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Public Policy at Hugging Face

AI Policy at Hugging Face is a multidisciplinary and cross-organizational workstream. Instead of being part of a vertical communications or global affairs organization, our policy work is rooted in the expertise of our many researchers and developers, from Ethics and Society Regulars and legal team to machine learning engineers working on healthcare, art, and evaluations. What we work on is informed by our Hugging Face community needs and experiences… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/policy-docs.
documentation-images
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face (2025). documentation-images [Dataset]. https://huggingface.co/datasets/huggingface/documentation-images
Explore at:
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Hugging Facehttps://huggingface.co/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains images used in the documentation of HuggingFace's libraries.

HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.
Z
Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias,...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jan 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pepe, Federica; Nardone, Vittoria; Mastropaolo, Antonio; Canfora, Gerardo; BAVOTA, Gabriele; Di Penta, Massimiliano (2024). Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8200098
Explore at:
Dataset updated
Jan 16, 2024
Dataset provided by
University of Sannio
Università degli Studi del Sannio
University of Molise
Università della Svizzera italiana
Authors
Pepe, Federica; Nardone, Vittoria; Mastropaolo, Antonio; Canfora, Gerardo; BAVOTA, Gabriele; Di Penta, Massimiliano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This replication package contains datasets and scripts related to the paper: "*How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study*"

Root directory

statistics.r: R script used to compute the correlation between usage and downloads, and the RQ1/RQ2 inter-rater agreements

modelsInfo.zip: zip file containing all the downloaded model cards (in JSON format)

script: directory containing all the scripts used to collect and process data. For further details, see README file inside the script directory.

Dataset

Dataset/Dataset_HF-models-list.csv: list of HF models analyzed

Dataset/Dataset_github-prj-list.txt: list of GitHub projects using the transformers library

Dataset/Dataset_github-Prj_model-Used.csv: contains usage pairs: project, model

Dataset/Dataset_prj-num-models-reused.csv: number of models used by each GitHub project

Dataset/Dataset_model-download_num-prj_correlation.csv contains, for each model used by GitHub projects: the name, the task, the number of reusing projects, and the number of downloads

RQ1

RQ1/RQ1_dataset-list.txt: list of HF datasets

RQ1/RQ1_datasetSample.csv: sample set of models used for the manual analysis of datasets

RQ1/RQ1_analyzeDatasetTags.py: Python script to analyze model tags for the presence of datasets. it requires to unzip the modelsInfo.zip in a directory with the same name (modelsInfo) at the root of the replication package folder. Produces the output to stdout. To redirect in a file fo be analyzed by the RQ2/countDataset.py script

RQ1/RQ1_countDataset.py: given the output of RQ2/analyzeDatasetTags.py (passed as argument) produces, for each model, a list of Booleans indicating whether (i) the model only declares HF datasets, (ii) the model only declares external datasets, (iii) the model declares both, and (iv) the model is part of the sample for the manual analysis

RQ1/RQ1_datasetTags.csv: output of RQ2/analyzeDatasetTags.py

RQ1/RQ1_dataset_usage_count.csv: output of RQ2/countDataset.py

RQ2

RQ2/tableBias.pdf: table detailing the number of occurrences of different types of bias by model Task

RQ2/RQ2_bias_classification_sheet.csv: results of the manual labeling

RQ2/RQ2_isBiased.csv: file to compute the inter-rater agreement of whether or not a model documents Bias

RQ2/RQ2_biasAgrLabels.csv: file to compute the inter-rater agreement related to bias categories

RQ2/RQ2_final_bias_categories_with_levels.csv: for each model in the sample, this file lists (i) the bias leaf category, (ii) the first-level category, and (iii) the intermediate category

RQ3

RQ3/RQ3_LicenseValidation.csv: manual validation of a sample of licenses

RQ3/RQ3_{NETWORK-RESTRICTIVE|RESTRICTIVE|WEAK-RESTRICTIVE|PERMISSIVE}-license-list.txt: lists of licenses with different permissiveness

RQ3/RQ3_prjs_license.csv: for each project linked to models, among other fields it indicates the license tag and name

RQ3/RQ3_models_license.csv: for each model, indicates among other pieces of info, whether the model has a license, and if yes what kind of license

RQ3/RQ3_model-prj-license_contingency_table.csv: usage contingency table between projects' licenses (columns) and models' licenses (rows)

RQ3/RQ3_models_prjs_licenses_with_type.csv: pairs project-model, with their respective licenses and permissiveness level

scripts

Contains the scripts used to mine Hugging Face and GitHub. Details are in the enclosed README
h
huggingface_doc
huggingface.co
Updated Jan 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aymeric Roucher (2024). huggingface_doc [Dataset]. https://huggingface.co/datasets/m-ric/huggingface_doc
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 19, 2024
Authors
Aymeric Roucher
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
m-ric/huggingface_doc dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: hugging face datasets
kaggle.com
zip
Updated Nov 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Broad (2025). hugging face datasets [Dataset]. https://www.kaggle.com/nbroad/hf-ds
Explore at:
zip(70163997 bytes)Available download formats
Dataset updated
Nov 3, 2025
Authors
Nicholas Broad
Description
This is the latest version of Hugging Face datasets to be used in offline notebooks on Kaggle. It is automatically updated every week.

Docs are here

Installation Instructions

!pip install datasets --no-index --find-links=file:///kaggle/input/hf-ds -U -q
h
documentation-images
huggingface.co
Updated Jun 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The LLM Course (2022). documentation-images [Dataset]. https://huggingface.co/datasets/huggingface-course/documentation-images
Explore at:
Dataset updated
Jun 30, 2022
Dataset authored and provided by
The LLM Course
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
huggingface-course/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
documentation-images
huggingface.co
Updated Apr 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face Optimum (2023). documentation-images [Dataset]. https://huggingface.co/datasets/optimum/documentation-images
Explore at:
Dataset updated
Apr 8, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face Optimum
Description
This dataset contains images used in the documentation of HuggingFace's Optimum library.
h
markdown-documentation-transformers
huggingface.co
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Schmid (2023). markdown-documentation-transformers [Dataset]. https://huggingface.co/datasets/philschmid/markdown-documentation-transformers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Authors
Philipp Schmid
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Hugging Face Transformers documentation as markdown dataset

This dataset was created using Clipper.js. Clipper is a Node.js command line tool that allows you to easily clip content from web pages and convert it to Markdown. It uses Mozilla's Readability library and Turndown under the hood to parse web page content and convert it to Markdown. This dataset can be used to create RAG applications, which want to use the transformers documentation. Example document:… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/markdown-documentation-transformers.
h
starcoder2-documentation
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qian Liu, starcoder2-documentation [Dataset]. https://huggingface.co/datasets/SivilTaram/starcoder2-documentation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Qian Liu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card

This dataset is the code documenation dataset used in StarCoder2 pre-training, and it is also part of the-stack-v2-train-extras descried in the paper.

Dataset Details Overview

This dataset comprises a comprehensive collection of crawled documentation and code-related resources sourced from various package manager platforms and programming language documentation sites. It focuses on popular libraries, free programming books, and other relevant… See the full description on the dataset page: https://huggingface.co/datasets/SivilTaram/starcoder2-documentation.
h
documentation-images
huggingface.co
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eustache Le Bihan (2025). documentation-images [Dataset]. https://huggingface.co/datasets/eustlb/documentation-images
Explore at:
Dataset updated
May 1, 2025
Authors
Eustache Le Bihan
Description
eustlb/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
h
medical-documentation-dataset
huggingface.co
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tech titans (2025). medical-documentation-dataset [Dataset]. https://huggingface.co/datasets/techtitans232/medical-documentation-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2025
Authors
tech titans
Description
techtitans232/medical-documentation-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
documentation-images
huggingface.co
Updated Nov 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Berenstein (2025). documentation-images [Dataset]. https://huggingface.co/datasets/davidberenstein1957/documentation-images
Explore at:
Dataset updated
Nov 28, 2025
Authors
David Berenstein
Description
davidberenstein1957/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
h
documentation-images
huggingface.co
Updated Aug 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technology Innovation Institute (2024). documentation-images [Dataset]. https://huggingface.co/datasets/tiiuae/documentation-images
Explore at:
Dataset updated
Aug 17, 2024
Dataset authored and provided by
Technology Innovation Institute
Description
tiiuae/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
h
documentation-images
huggingface.co
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
dmeck lf (2025). documentation-images [Dataset]. https://huggingface.co/datasets/glide-the/documentation-images
Explore at:
Dataset updated
Aug 11, 2025
Dataset authored and provided by
dmeck lf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
glide-the/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
h
documentation-images
huggingface.co
Updated Oct 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nate Raw (2022). documentation-images [Dataset]. https://huggingface.co/datasets/nateraw/documentation-images
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 6, 2022
Authors
Nate Raw
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
nateraw/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
example-documents
huggingface.co
Updated Sep 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face Internal Testing Organization (2022). example-documents [Dataset]. https://huggingface.co/datasets/hf-internal-testing/example-documents
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2022
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face Internal Testing Organization
Description
hf-internal-testing/example-documents dataset hosted on Hugging Face and contributed by the HF Datasets community
h
documentation-images
huggingface.co
Updated Mar 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ChunTe Lee (2025). documentation-images [Dataset]. https://huggingface.co/datasets/Chunte/documentation-images
Explore at:
Dataset updated
Mar 4, 2025
Authors
ChunTe Lee
Description
Chunte/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community
h
hf-docs-retrieval
huggingface.co
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mic (2024). hf-docs-retrieval [Dataset]. https://huggingface.co/datasets/micpst/hf-docs-retrieval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2024
Authors
Mic
Description
micpst/hf-docs-retrieval dataset hosted on Hugging Face and contributed by the HF Datasets community
h
langchain-docs-23-06-27
huggingface.co
Updated Jun 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Briggs (2023). langchain-docs-23-06-27 [Dataset]. https://huggingface.co/datasets/jamescalam/langchain-docs-23-06-27
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 27, 2023
Authors
James Briggs
Description
jamescalam/langchain-docs-23-06-27 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Documentation-files
huggingface.co
Updated Oct 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Waseem (2023). Documentation-files [Dataset]. https://huggingface.co/datasets/hwaseem04/Documentation-files
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2023
Authors
Muhammad Waseem
Description
hwaseem04/Documentation-files dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Hugging Face (2024). policy-docs [Dataset]. https://huggingface.co/datasets/huggingface/policy-docs

policy-docs

huggingface/policy-docs

Explore at:

Dataset updated

Apr 3, 2024

Dataset authored and provided by

Hugging Facehttps://huggingface.co/

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Public Policy at Hugging Face

AI Policy at Hugging Face is a multidisciplinary and cross-organizational workstream. Instead of being part of a vertical communications or global affairs organization, our policy work is rooted in the expertise of our many researchers and developers, from Ethics and Society Regulars and legal team to machine learning engineers working on healthcare, art, and evaluations. What we work on is informed by our Hugging Face community needs and experiences… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/policy-docs.

Clear search

Close search

Google apps

Main menu

policy-docs

documentation-images

Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias,...

Root directory

Dataset

RQ1

RQ2

RQ3

scripts

huggingface_doc

Data from: hugging face datasets

Installation Instructions

documentation-images

documentation-images

markdown-documentation-transformers

starcoder2-documentation

documentation-images

medical-documentation-dataset

documentation-images

documentation-images

documentation-images

documentation-images

example-documents

documentation-images

hf-docs-retrieval

langchain-docs-23-06-27

Documentation-files

policy-docs

huggingface/policy-docs