100+ datasets found
  1. Huggingface Hub Permissible models and datasets

    • kaggle.com
    zip
    Updated Dec 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dheeraj M Pai (2023). Huggingface Hub Permissible models and datasets [Dataset]. https://www.kaggle.com/datasets/dheerajmpai/huggingface-hub-permissible-models-and-datasets
    Explore at:
    zip(34761279 bytes)Available download formats
    Dataset updated
    Dec 26, 2023
    Authors
    Dheeraj M Pai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Huggingface Hub: Models, Datasets, and Spaces

    Dataset Overview

    This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.

    Key Features

    • Comprehensive Data: Includes exhaustive details on all models, datasets, and spaces from the Huggingface Hub.
    • Permissible Models: A specialized subset is provided in a separate CSV file, focusing exclusively on models that are permissible for use.
    • Regularly Updated: The dataset is refreshed weekly to ensure the latest information is always available.

    Last Update

    • Date: December 26, 2023

    Update Frequency

    • Frequency: Weekly

    Dataset Contents

    1. Models: Detailed listings of all models available on Huggingface Hub.
    2. Datasets: Comprehensive information on datasets hosted on the Hub.
    3. Spaces: An overview of the different spaces and their functionalities.
    4. Permissible Models CSV: A smaller, curated list of models that are cleared for use.

    Usage

    This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.

    Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.

  2. h

    wds_flickr8k

    • huggingface.co
    Updated Jan 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    clip-benchmark (2024). wds_flickr8k [Dataset]. https://huggingface.co/datasets/clip-benchmark/wds_flickr8k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 5, 2024
    Dataset authored and provided by
    clip-benchmark
    Description

    clip-benchmark/wds_flickr8k dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    fineweb

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FineData, fineweb [Dataset]. http://doi.org/10.57967/hf/2493
    Explore at:
    Dataset authored and provided by
    FineData
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    🍷 FineWeb

    15 trillion tokens of the finest data the 🌐 web has to offer

      What is it?
    

    The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.

  4. h

    huggingface-hub-classes-chunks

    • huggingface.co
    Updated Jan 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Rogge (2025). huggingface-hub-classes-chunks [Dataset]. https://huggingface.co/datasets/nielsr/huggingface-hub-classes-chunks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2025
    Authors
    Niels Rogge
    Description

    nielsr/huggingface-hub-classes-chunks dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Huggingface Hub 0.27.0

    • kaggle.com
    zip
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aishik Rakshit (2024). Huggingface Hub 0.27.0 [Dataset]. https://www.kaggle.com/datasets/aishikai/huggingface-hub-0-27-0
    Explore at:
    zip(438787 bytes)Available download formats
    Dataset updated
    Dec 26, 2024
    Authors
    Aishik Rakshit
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Aishik Rakshit

    Released under Apache 2.0

    Contents

  6. Image Caption Models from HuggingFace Hub

    • kaggle.com
    zip
    Updated Feb 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanrei(nama) (2023). Image Caption Models from HuggingFace Hub [Dataset]. https://www.kaggle.com/datasets/tanreinama/image-caption-models
    Explore at:
    zip(27655050862 bytes)Available download formats
    Dataset updated
    Feb 27, 2023
    Authors
    Tanrei(nama)
    Description
  7. Hugging Face Models

    • kaggle.com
    zip
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A T M Ragib Raihan (2023). Hugging Face Models [Dataset]. https://www.kaggle.com/datasets/atmragib/hugging-face-models/code
    Explore at:
    zip(13652285 bytes)Available download formats
    Dataset updated
    Nov 28, 2023
    Authors
    A T M Ragib Raihan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Contex

    The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub.

    Data Source Link: huggingface.co/models

    Attribute Information

    VariableDescription
    model_id
    pipelineThere are total 40 pipelines. To learn more read: Hugging Face Pipeline
    downloads
    likes
    author_id
    author_name
    author_typeuser or organization
    author_isProPaid user or organization
    lastModifiedfrom 2014-08-10 to 2023-11-27
  8. h

    dataset_cards_with_metadata

    • huggingface.co
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Librarian Bots (2025). dataset_cards_with_metadata [Dataset]. https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Librarian Bots
    Description

    Dataset Card for Hugging Face Hub Dataset Cards

    This datasets consists of dataset cards for models hosted on the Hugging Face Hub. The dataset cards are created by the community and provide information about datasets hosted on the Hugging Face Hub. This dataset is updated on a daily basis and includes publicly available datasets on the Hugging Face Hub. This dataset is made available to help support users wanting to work with a large number of Dataset Cards from the Hub. We hope… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata.

  9. h

    huggingface-hub-docs-chunks-test

    • huggingface.co
    Updated Jan 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niels Rogge (2025). huggingface-hub-docs-chunks-test [Dataset]. https://huggingface.co/datasets/nielsr/huggingface-hub-docs-chunks-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2025
    Authors
    Niels Rogge
    Description

    nielsr/huggingface-hub-docs-chunks-test dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. documentation-images

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face (2025). documentation-images [Dataset]. https://huggingface.co/datasets/huggingface/documentation-images
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Hugging Facehttps://huggingface.co/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains images used in the documentation of HuggingFace's libraries.

    HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.

  11. h

    hub-stats

    • huggingface.co
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caleb Fahlgren (2025). hub-stats [Dataset]. https://huggingface.co/datasets/cfahlgren1/hub-stats
    Explore at:
    Dataset updated
    Nov 24, 2025
    Authors
    Caleb Fahlgren
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Changelog

    NEW Changes July 25th

    added baseModels field to models which shows the models that the user tagged as base models for that model

    Example: { "models": [ { "_id": "687de260234339fed21e768a", "id": "Qwen/Qwen3-235B-A22B-Instruct-2507" } ], "relation": "quantized" }

    NEW Changes July 9th

    Fixed issue with gguf column with integer overflow causing import pipeline to be broken over a few weeks ✅

    NEW Changes Feb 27th

    Added new fields on the models… See the full description on the dataset page: https://huggingface.co/datasets/cfahlgren1/hub-stats.

  12. h

    hub-tldr-dataset-summaries-llama

    • huggingface.co
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel van Strien (2025). hub-tldr-dataset-summaries-llama [Dataset]. https://huggingface.co/datasets/davanstrien/hub-tldr-dataset-summaries-llama
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2025
    Authors
    Daniel van Strien
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset card for dataset-summaries-llama

    This dataset contains AI-generated summaries of dataset cards from the Hugging Face Hub, generated using meta-llama/Llama-3.3-70B-Instruct. It is designed to be used in combination with a similar dataset of model card summaries for initial supervised fine-tuning (SFT) of language models specialized in generating tl;dr summaries of dataset and model cards from the Hugging Face Hub. This dataset was made with Curator.

      Dataset… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/hub-tldr-dataset-summaries-llama.
    
  13. SocialCounterfactuals

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Intel (2024). SocialCounterfactuals [Dataset]. https://huggingface.co/datasets/Intel/SocialCounterfactuals
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Dataset authored and provided by
    Intelhttp://intel.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Intel/SocialCounterfactuals dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    hub_models_with_base_model_info

    • huggingface.co
    Updated Dec 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Librarian Bots (2023). hub_models_with_base_model_info [Dataset]. https://huggingface.co/datasets/librarian-bots/hub_models_with_base_model_info
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2023
    Dataset authored and provided by
    Librarian Bots
    Description

    Dataset Card for Hugging Face Hub Models with Base Model Metadata

      Dataset Details
    

    This dataset contains a subset of possible metadata for models hosted on the Hugging Face Hub. All of these models contain base_model metadata i.e. information about the model used for fine-tuning. This data can be used for creating network graphs showing links between models on the Hub.

      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/hub_models_with_base_model_info.

  15. h

    codeparrot

    • huggingface.co
    Updated Sep 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2021
    Dataset authored and provided by
    Natural Language Processing with Transformers
    Description

    CodeParrot 🦜 Dataset

      What is it?
    

    This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

      Creation
    

    It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.

  16. h

    the_cauldron

    • huggingface.co
    Updated Apr 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2024). the_cauldron [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/the_cauldron
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2024
    Dataset authored and provided by
    HuggingFaceM4
    Description

    Dataset Card for The Cauldron

      Dataset description
    

    The Cauldron is part of the Idefics2 release. It is a massive collection of 50 vision-language datasets (training sets only) that were used for the fine-tuning of the vision-language model Idefics2.

      Load the dataset
    

    To load the dataset, install the library datasets with pip install datasets. Then, from datasets import load_dataset ds = load_dataset("HuggingFaceM4/the_cauldron", "ai2d")

    to download and load the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron.

  17. h

    webui-7k

    • huggingface.co
    Updated Nov 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Big Lab (2024). webui-7k [Dataset]. https://huggingface.co/datasets/biglab/webui-7k
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset authored and provided by
    Big Lab
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This data accompanies the WebUI project (https://dl.acm.org/doi/abs/10.1145/3544548.3581158) For more information, check out the project website: https://uimodeling.github.io/ To download this dataset, you need to install the huggingface-hub package pip install huggingface-hub

    Use snapshot_download from huggingface_hub import snapshot_download snapshot_download(repo_id="biglab/webui-7k", repo_type="dataset")

    IMPORTANT

    Before downloading and using, please review the copyright info here:… See the full description on the dataset page: https://huggingface.co/datasets/biglab/webui-7k.

  18. h

    pubmed-abstract

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uiyun Kim, pubmed-abstract [Dataset]. https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract
    Explore at:
    Authors
    Uiyun Kim
    Description

    Dataset Summary

    A daily-updated dataset of PubMed abstracts, collected via PubMed’s API and published on Hugging Face Datasets.Each snapshot is versioned by date (e.g., 2025-03-28) so users can track historical changes or use a consistent snapshot for reproducibility.

    Updated daily Each version tagged by date Abstract-only dataset (no full text)

      Dataset Structure
    

    Column Type Description

    pmid string Unique PubMed identifier

    abstract string Abstract text… See the full description on the dataset page: https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract.

  19. h

    mini-imdb

    • huggingface.co
    Updated Sep 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila (2022). mini-imdb [Dataset]. https://huggingface.co/datasets/dvilasuero/mini-imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2022
    Authors
    Daniel Vila
    Description

    dvilasuero/mini-imdb dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    dev-push-to-hub

    • huggingface.co
    Updated Aug 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashim Mahara (2021). dev-push-to-hub [Dataset]. https://huggingface.co/datasets/ashim/dev-push-to-hub
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 31, 2021
    Authors
    Ashim Mahara
    Description

    ashim/dev-push-to-hub dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dheeraj M Pai (2023). Huggingface Hub Permissible models and datasets [Dataset]. https://www.kaggle.com/datasets/dheerajmpai/huggingface-hub-permissible-models-and-datasets
Organization logo

Huggingface Hub Permissible models and datasets

Huggingface Hub models, datasets and spaces.

Explore at:
zip(34761279 bytes)Available download formats
Dataset updated
Dec 26, 2023
Authors
Dheeraj M Pai
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.

Key Features

  • Comprehensive Data: Includes exhaustive details on all models, datasets, and spaces from the Huggingface Hub.
  • Permissible Models: A specialized subset is provided in a separate CSV file, focusing exclusively on models that are permissible for use.
  • Regularly Updated: The dataset is refreshed weekly to ensure the latest information is always available.

Last Update

  • Date: December 26, 2023

Update Frequency

  • Frequency: Weekly

Dataset Contents

  1. Models: Detailed listings of all models available on Huggingface Hub.
  2. Datasets: Comprehensive information on datasets hosted on the Hub.
  3. Spaces: An overview of the different spaces and their functionalities.
  4. Permissible Models CSV: A smaller, curated list of models that are cleared for use.

Usage

This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.

Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.

Search
Clear search
Close search
Google apps
Main menu