100+ datasets found
  1. policy-docs

    • huggingface.co
    Updated Apr 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face (2024). policy-docs [Dataset]. https://huggingface.co/datasets/huggingface/policy-docs
    Explore at:
    Dataset updated
    Apr 3, 2024
    Dataset authored and provided by
    Hugging Facehttps://huggingface.co/
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Public Policy at Hugging Face

    AI Policy at Hugging Face is a multidisciplinary and cross-organizational workstream. Instead of being part of a vertical communications or global affairs organization, our policy work is rooted in the expertise of our many researchers and developers, from Ethics and Society Regulars and legal team to machine learning engineers working on healthcare, art, and evaluations. What we work on is informed by our Hugging Face community needs and experiences… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/policy-docs.

  2. documentation-images

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face (2025). documentation-images [Dataset]. https://huggingface.co/datasets/huggingface/documentation-images
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Hugging Facehttps://huggingface.co/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains images used in the documentation of HuggingFace's libraries.

    HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.

  3. Z

    Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias,...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pepe, Federica; Nardone, Vittoria; Mastropaolo, Antonio; Canfora, Gerardo; BAVOTA, Gabriele; Di Penta, Massimiliano (2024). Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8200098
    Explore at:
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    University of Sannio
    Università degli Studi del Sannio
    University of Molise
    Università della Svizzera italiana
    Authors
    Pepe, Federica; Nardone, Vittoria; Mastropaolo, Antonio; Canfora, Gerardo; BAVOTA, Gabriele; Di Penta, Massimiliano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This replication package contains datasets and scripts related to the paper: "*How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study*"

    Root directory

    • statistics.r: R script used to compute the correlation between usage and downloads, and the RQ1/RQ2 inter-rater agreements
    • modelsInfo.zip: zip file containing all the downloaded model cards (in JSON format)
    • script: directory containing all the scripts used to collect and process data. For further details, see README file inside the script directory.

    Dataset

    • Dataset/Dataset_HF-models-list.csv: list of HF models analyzed
    • Dataset/Dataset_github-prj-list.txt: list of GitHub projects using the transformers library
    • Dataset/Dataset_github-Prj_model-Used.csv: contains usage pairs: project, model
    • Dataset/Dataset_prj-num-models-reused.csv: number of models used by each GitHub project
    • Dataset/Dataset_model-download_num-prj_correlation.csv contains, for each model used by GitHub projects: the name, the task, the number of reusing projects, and the number of downloads

    RQ1

    • RQ1/RQ1_dataset-list.txt: list of HF datasets
    • RQ1/RQ1_datasetSample.csv: sample set of models used for the manual analysis of datasets
    • RQ1/RQ1_analyzeDatasetTags.py: Python script to analyze model tags for the presence of datasets. it requires to unzip the modelsInfo.zip in a directory with the same name (modelsInfo) at the root of the replication package folder. Produces the output to stdout. To redirect in a file fo be analyzed by the RQ2/countDataset.py script
    • RQ1/RQ1_countDataset.py: given the output of RQ2/analyzeDatasetTags.py (passed as argument) produces, for each model, a list of Booleans indicating whether (i) the model only declares HF datasets, (ii) the model only declares external datasets, (iii) the model declares both, and (iv) the model is part of the sample for the manual analysis
    • RQ1/RQ1_datasetTags.csv: output of RQ2/analyzeDatasetTags.py
    • RQ1/RQ1_dataset_usage_count.csv: output of RQ2/countDataset.py

    RQ2

    • RQ2/tableBias.pdf: table detailing the number of occurrences of different types of bias by model Task
    • RQ2/RQ2_bias_classification_sheet.csv: results of the manual labeling
    • RQ2/RQ2_isBiased.csv: file to compute the inter-rater agreement of whether or not a model documents Bias
    • RQ2/RQ2_biasAgrLabels.csv: file to compute the inter-rater agreement related to bias categories
    • RQ2/RQ2_final_bias_categories_with_levels.csv: for each model in the sample, this file lists (i) the bias leaf category, (ii) the first-level category, and (iii) the intermediate category

    RQ3

    • RQ3/RQ3_LicenseValidation.csv: manual validation of a sample of licenses
    • RQ3/RQ3_{NETWORK-RESTRICTIVE|RESTRICTIVE|WEAK-RESTRICTIVE|PERMISSIVE}-license-list.txt: lists of licenses with different permissiveness
    • RQ3/RQ3_prjs_license.csv: for each project linked to models, among other fields it indicates the license tag and name
    • RQ3/RQ3_models_license.csv: for each model, indicates among other pieces of info, whether the model has a license, and if yes what kind of license
    • RQ3/RQ3_model-prj-license_contingency_table.csv: usage contingency table between projects' licenses (columns) and models' licenses (rows)
    • RQ3/RQ3_models_prjs_licenses_with_type.csv: pairs project-model, with their respective licenses and permissiveness level

    scripts

    Contains the scripts used to mine Hugging Face and GitHub. Details are in the enclosed README

  4. h

    huggingface_doc

    • huggingface.co
    Updated Jan 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aymeric Roucher (2024). huggingface_doc [Dataset]. https://huggingface.co/datasets/m-ric/huggingface_doc
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 19, 2024
    Authors
    Aymeric Roucher
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    m-ric/huggingface_doc dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Data from: hugging face datasets

    • kaggle.com
    zip
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Broad (2025). hugging face datasets [Dataset]. https://www.kaggle.com/nbroad/hf-ds
    Explore at:
    zip(70163997 bytes)Available download formats
    Dataset updated
    Nov 3, 2025
    Authors
    Nicholas Broad
    Description

    This is the latest version of Hugging Face datasets to be used in offline notebooks on Kaggle. It is automatically updated every week.

    Docs are here

    Installation Instructions

    !pip install datasets --no-index --find-links=file:///kaggle/input/hf-ds -U -q

  6. h

    documentation-images

    • huggingface.co
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The LLM Course (2022). documentation-images [Dataset]. https://huggingface.co/datasets/huggingface-course/documentation-images
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    The LLM Course
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    huggingface-course/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. documentation-images

    • huggingface.co
    Updated Apr 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face Optimum (2023). documentation-images [Dataset]. https://huggingface.co/datasets/optimum/documentation-images
    Explore at:
    Dataset updated
    Apr 8, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face Optimum
    Description

    This dataset contains images used in the documentation of HuggingFace's Optimum library.

  8. h

    markdown-documentation-transformers

    • huggingface.co
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Schmid (2023). markdown-documentation-transformers [Dataset]. https://huggingface.co/datasets/philschmid/markdown-documentation-transformers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2023
    Authors
    Philipp Schmid
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Hugging Face Transformers documentation as markdown dataset

    This dataset was created using Clipper.js. Clipper is a Node.js command line tool that allows you to easily clip content from web pages and convert it to Markdown. It uses Mozilla's Readability library and Turndown under the hood to parse web page content and convert it to Markdown. This dataset can be used to create RAG applications, which want to use the transformers documentation. Example document:… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/markdown-documentation-transformers.

  9. h

    starcoder2-documentation

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qian Liu, starcoder2-documentation [Dataset]. https://huggingface.co/datasets/SivilTaram/starcoder2-documentation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Qian Liu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card

    This dataset is the code documenation dataset used in StarCoder2 pre-training, and it is also part of the-stack-v2-train-extras descried in the paper.

      Dataset Details
    
    
    
    
    
      Overview
    

    This dataset comprises a comprehensive collection of crawled documentation and code-related resources sourced from various package manager platforms and programming language documentation sites. It focuses on popular libraries, free programming books, and other relevant… See the full description on the dataset page: https://huggingface.co/datasets/SivilTaram/starcoder2-documentation.

  10. h

    documentation-images

    • huggingface.co
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eustache Le Bihan (2025). documentation-images [Dataset]. https://huggingface.co/datasets/eustlb/documentation-images
    Explore at:
    Dataset updated
    May 1, 2025
    Authors
    Eustache Le Bihan
    Description

    eustlb/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    medical-documentation-dataset

    • huggingface.co
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tech titans (2025). medical-documentation-dataset [Dataset]. https://huggingface.co/datasets/techtitans232/medical-documentation-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2025
    Authors
    tech titans
    Description

    techtitans232/medical-documentation-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    documentation-images

    • huggingface.co
    Updated Nov 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Berenstein (2025). documentation-images [Dataset]. https://huggingface.co/datasets/davidberenstein1957/documentation-images
    Explore at:
    Dataset updated
    Nov 28, 2025
    Authors
    David Berenstein
    Description

    davidberenstein1957/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. h

    documentation-images

    • huggingface.co
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technology Innovation Institute (2024). documentation-images [Dataset]. https://huggingface.co/datasets/tiiuae/documentation-images
    Explore at:
    Dataset updated
    Aug 17, 2024
    Dataset authored and provided by
    Technology Innovation Institute
    Description

    tiiuae/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    documentation-images

    • huggingface.co
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dmeck lf (2025). documentation-images [Dataset]. https://huggingface.co/datasets/glide-the/documentation-images
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset authored and provided by
    dmeck lf
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    glide-the/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    documentation-images

    • huggingface.co
    Updated Oct 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2022). documentation-images [Dataset]. https://huggingface.co/datasets/nateraw/documentation-images
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2022
    Authors
    Nate Raw
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    nateraw/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. example-documents

    • huggingface.co
    Updated Sep 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face Internal Testing Organization (2022). example-documents [Dataset]. https://huggingface.co/datasets/hf-internal-testing/example-documents
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face Internal Testing Organization
    Description

    hf-internal-testing/example-documents dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    documentation-images

    • huggingface.co
    Updated Mar 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ChunTe Lee (2025). documentation-images [Dataset]. https://huggingface.co/datasets/Chunte/documentation-images
    Explore at:
    Dataset updated
    Mar 4, 2025
    Authors
    ChunTe Lee
    Description

    Chunte/documentation-images dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. h

    hf-docs-retrieval

    • huggingface.co
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mic (2024). hf-docs-retrieval [Dataset]. https://huggingface.co/datasets/micpst/hf-docs-retrieval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2024
    Authors
    Mic
    Description

    micpst/hf-docs-retrieval dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    langchain-docs-23-06-27

    • huggingface.co
    Updated Jun 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Briggs (2023). langchain-docs-23-06-27 [Dataset]. https://huggingface.co/datasets/jamescalam/langchain-docs-23-06-27
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2023
    Authors
    James Briggs
    Description

    jamescalam/langchain-docs-23-06-27 dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    Documentation-files

    • huggingface.co
    Updated Oct 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Waseem (2023). Documentation-files [Dataset]. https://huggingface.co/datasets/hwaseem04/Documentation-files
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2023
    Authors
    Muhammad Waseem
    Description

    hwaseem04/Documentation-files dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hugging Face (2024). policy-docs [Dataset]. https://huggingface.co/datasets/huggingface/policy-docs
Organization logo

policy-docs

huggingface/policy-docs

Explore at:
Dataset updated
Apr 3, 2024
Dataset authored and provided by
Hugging Facehttps://huggingface.co/
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Public Policy at Hugging Face

AI Policy at Hugging Face is a multidisciplinary and cross-organizational workstream. Instead of being part of a vertical communications or global affairs organization, our policy work is rooted in the expertise of our many researchers and developers, from Ethics and Society Regulars and legal team to machine learning engineers working on healthcare, art, and evaluations. What we work on is informed by our Hugging Face community needs and experiences… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/policy-docs.

Search
Clear search
Close search
Google apps
Main menu