100+ datasets found

Huggingface Hub Permissible models and datasets
kaggle.com
zip
Updated Dec 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dheeraj M Pai (2023). Huggingface Hub Permissible models and datasets [Dataset]. https://www.kaggle.com/datasets/dheerajmpai/huggingface-hub-permissible-models-and-datasets
Explore at:
zip(34761279 bytes)Available download formats
Dataset updated
Dec 26, 2023
Authors
Dheeraj M Pai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.

Key Features

Comprehensive Data: Includes exhaustive details on all models, datasets, and spaces from the Huggingface Hub.

Permissible Models: A specialized subset is provided in a separate CSV file, focusing exclusively on models that are permissible for use.

Regularly Updated: The dataset is refreshed weekly to ensure the latest information is always available.

Last Update

Date: December 26, 2023

Update Frequency

Frequency: Weekly

Dataset Contents

Models: Detailed listings of all models available on Huggingface Hub.

Datasets: Comprehensive information on datasets hosted on the Hub.

Spaces: An overview of the different spaces and their functionalities.

Permissible Models CSV: A smaller, curated list of models that are cleared for use.

Usage

This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.

Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.
h
wds_flickr8k
huggingface.co
Updated Jan 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
clip-benchmark (2024). wds_flickr8k [Dataset]. https://huggingface.co/datasets/clip-benchmark/wds_flickr8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 5, 2024
Dataset authored and provided by
clip-benchmark
Description
clip-benchmark/wds_flickr8k dataset hosted on Hugging Face and contributed by the HF Datasets community
h
fineweb
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FineData, fineweb [Dataset]. http://doi.org/10.57967/hf/2493
Explore at:
Unique identifier
https://doi.org/10.57967/hf/2493
Dataset authored and provided by
FineData
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
🍷 FineWeb

15 trillion tokens of the finest data the 🌐 web has to offer

What is it?

The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
h
huggingface-hub-classes-chunks
huggingface.co
Updated Jan 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niels Rogge (2025). huggingface-hub-classes-chunks [Dataset]. https://huggingface.co/datasets/nielsr/huggingface-hub-classes-chunks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2025
Authors
Niels Rogge
Description
nielsr/huggingface-hub-classes-chunks dataset hosted on Hugging Face and contributed by the HF Datasets community
Huggingface Hub 0.27.0
kaggle.com
zip
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aishik Rakshit (2024). Huggingface Hub 0.27.0 [Dataset]. https://www.kaggle.com/datasets/aishikai/huggingface-hub-0-27-0
Explore at:
zip(438787 bytes)Available download formats
Dataset updated
Dec 26, 2024
Authors
Aishik Rakshit
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Aishik Rakshit

Released under Apache 2.0

Contents
Image Caption Models from HuggingFace Hub
kaggle.com
zip
Updated Feb 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanrei(nama) (2023). Image Caption Models from HuggingFace Hub [Dataset]. https://www.kaggle.com/datasets/tanreinama/image-caption-models
Explore at:
zip(27655050862 bytes)Available download formats
Dataset updated
Feb 27, 2023
Authors
Tanrei(nama)
Description
blip-image-captioning-base https://huggingface.co/Salesforce/blip-image-captioning-base blip-image-captioning-large https://huggingface.co/Salesforce/blip-image-captioning-large blip2-flan-t5-xl https://huggingface.co/Salesforce/blip2-flan-t5-xl blip2-opt-2.7b https://huggingface.co/Salesforce/blip2-opt-2.7b git-base https://huggingface.co/microsoft/git-base git-base-coco https://huggingface.co/microsoft/git-base-coco git-large-coco https://huggingface.co/microsoft/git-large-coco git-large-r https://huggingface.co/microsoft/git-large-r image-caption-generator https://huggingface.co/bipin/image-caption-generator image_caption https://huggingface.co/jaimin/image_caption vit-gpt2-coco-en https://huggingface.co/ydshieh/vit-gpt2-coco-en vit-gpt2-image-captioning https://huggingface.co/nlpconnect/vit-gpt2-image-captioning vit-swin-base-224-gpt2-image-captioning https://huggingface.co/Abdou/vit-swin-base-224-gpt2-image-captioning

Hugging Face Models

kaggle.com

zip

Updated Nov 28, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

A T M Ragib Raihan (2023). Hugging Face Models [Dataset]. https://www.kaggle.com/datasets/atmragib/hugging-face-models/code

Explore at:

zip(13652285 bytes)Available download formats

Dataset updated

Nov 28, 2023

Authors

A T M Ragib Raihan

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Contex

The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub.

Data Source Link: huggingface.co/models

Attribute Information

Variable	Description
model_id
pipeline	There are total 40 pipelines. To learn more read: Hugging Face Pipeline
downloads
likes
author_id
author_name
author_type	user or organization
author_isPro	Paid user or organization
lastModified	from 2014-08-10 to 2023-11-27

h
dataset_cards_with_metadata
huggingface.co
Updated Nov 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Librarian Bots (2025). dataset_cards_with_metadata [Dataset]. https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata
Explore at:
Dataset updated
Nov 25, 2025
Dataset authored and provided by
Librarian Bots
Description
Dataset Card for Hugging Face Hub Dataset Cards

This datasets consists of dataset cards for models hosted on the Hugging Face Hub. The dataset cards are created by the community and provide information about datasets hosted on the Hugging Face Hub. This dataset is updated on a daily basis and includes publicly available datasets on the Hugging Face Hub. This dataset is made available to help support users wanting to work with a large number of Dataset Cards from the Hub. We hope… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata.
h
huggingface-hub-docs-chunks-test
huggingface.co
Updated Jan 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niels Rogge (2025). huggingface-hub-docs-chunks-test [Dataset]. https://huggingface.co/datasets/nielsr/huggingface-hub-docs-chunks-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2025
Authors
Niels Rogge
Description
nielsr/huggingface-hub-docs-chunks-test dataset hosted on Hugging Face and contributed by the HF Datasets community
documentation-images
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face (2025). documentation-images [Dataset]. https://huggingface.co/datasets/huggingface/documentation-images
Explore at:
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Hugging Facehttps://huggingface.co/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset contains images used in the documentation of HuggingFace's libraries.

HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.
h
hub-stats
huggingface.co
Updated Nov 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caleb Fahlgren (2025). hub-stats [Dataset]. https://huggingface.co/datasets/cfahlgren1/hub-stats
Explore at:
Dataset updated
Nov 24, 2025
Authors
Caleb Fahlgren
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Changelog

NEW Changes July 25th

added baseModels field to models which shows the models that the user tagged as base models for that model

Example: { "models": [ { "_id": "687de260234339fed21e768a", "id": "Qwen/Qwen3-235B-A22B-Instruct-2507" } ], "relation": "quantized" }

NEW Changes July 9th

Fixed issue with gguf column with integer overflow causing import pipeline to be broken over a few weeks ✅

NEW Changes Feb 27th

Added new fields on the models… See the full description on the dataset page: https://huggingface.co/datasets/cfahlgren1/hub-stats.
h
hub-tldr-dataset-summaries-llama
huggingface.co
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel van Strien (2025). hub-tldr-dataset-summaries-llama [Dataset]. https://huggingface.co/datasets/davanstrien/hub-tldr-dataset-summaries-llama
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2025
Authors
Daniel van Strien
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset card for dataset-summaries-llama

This dataset contains AI-generated summaries of dataset cards from the Hugging Face Hub, generated using meta-llama/Llama-3.3-70B-Instruct. It is designed to be used in combination with a similar dataset of model card summaries for initial supervised fine-tuning (SFT) of language models specialized in generating tl;dr summaries of dataset and model cards from the Hugging Face Hub. This dataset was made with Curator.

Dataset… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/hub-tldr-dataset-summaries-llama.
SocialCounterfactuals
huggingface.co
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intel (2024). SocialCounterfactuals [Dataset]. https://huggingface.co/datasets/Intel/SocialCounterfactuals
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2024
Dataset authored and provided by
Intelhttp://intel.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Intel/SocialCounterfactuals dataset hosted on Hugging Face and contributed by the HF Datasets community
h
hub_models_with_base_model_info
huggingface.co
Updated Dec 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Librarian Bots (2023). hub_models_with_base_model_info [Dataset]. https://huggingface.co/datasets/librarian-bots/hub_models_with_base_model_info
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2023
Dataset authored and provided by
Librarian Bots
Description
Dataset Card for Hugging Face Hub Models with Base Model Metadata

Dataset Details

This dataset contains a subset of possible metadata for models hosted on the Hugging Face Hub. All of these models contain base_model metadata i.e. information about the model used for fine-tuning. This data can be used for creating network graphs showing links between models on the Hub.

Dataset Description

Curated by: [More Information Needed] Funded by [optional]: [More… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/hub_models_with_base_model_info.
h
codeparrot
huggingface.co
Updated Sep 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Language Processing with Transformers (2021). codeparrot [Dataset]. https://huggingface.co/datasets/transformersbook/codeparrot
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2021
Dataset authored and provided by
Natural Language Processing with Transformers
Description
CodeParrot 🦜 Dataset

What is it?

This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.

Creation

It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.
h
the_cauldron
huggingface.co
Updated Apr 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HuggingFaceM4 (2024). the_cauldron [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/the_cauldron
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 15, 2024
Dataset authored and provided by
HuggingFaceM4
Description
Dataset Card for The Cauldron

Dataset description

The Cauldron is part of the Idefics2 release. It is a massive collection of 50 vision-language datasets (training sets only) that were used for the fine-tuning of the vision-language model Idefics2.

Load the dataset

To load the dataset, install the library datasets with pip install datasets. Then, from datasets import load_dataset ds = load_dataset("HuggingFaceM4/the_cauldron", "ai2d")

to download and load the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron.
h
webui-7k
huggingface.co
Updated Nov 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Big Lab (2024). webui-7k [Dataset]. https://huggingface.co/datasets/biglab/webui-7k
Explore at:
Dataset updated
Nov 1, 2024
Dataset authored and provided by
Big Lab
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
This data accompanies the WebUI project (https://dl.acm.org/doi/abs/10.1145/3544548.3581158) For more information, check out the project website: https://uimodeling.github.io/ To download this dataset, you need to install the huggingface-hub package pip install huggingface-hub

Use snapshot_download from huggingface_hub import snapshot_download snapshot_download(repo_id="biglab/webui-7k", repo_type="dataset")

IMPORTANT

Before downloading and using, please review the copyright info here:… See the full description on the dataset page: https://huggingface.co/datasets/biglab/webui-7k.
h
pubmed-abstract
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uiyun Kim, pubmed-abstract [Dataset]. https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract
Explore at:
Authors
Uiyun Kim
Description
Dataset Summary

A daily-updated dataset of PubMed abstracts, collected via PubMed’s API and published on Hugging Face Datasets.Each snapshot is versioned by date (e.g., 2025-03-28) so users can track historical changes or use a consistent snapshot for reproducibility.

Updated daily Each version tagged by date Abstract-only dataset (no full text)

Dataset Structure

Column Type Description

pmid string Unique PubMed identifier

abstract string Abstract text… See the full description on the dataset page: https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract.
h
mini-imdb
huggingface.co
Updated Sep 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Vila (2022). mini-imdb [Dataset]. https://huggingface.co/datasets/dvilasuero/mini-imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 30, 2022
Authors
Daniel Vila
Description
dvilasuero/mini-imdb dataset hosted on Hugging Face and contributed by the HF Datasets community
h
dev-push-to-hub
huggingface.co
Updated Aug 31, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashim Mahara (2021). dev-push-to-hub [Dataset]. https://huggingface.co/datasets/ashim/dev-push-to-hub
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 31, 2021
Authors
Ashim Mahara
Description
ashim/dev-push-to-hub dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Dheeraj M Pai (2023). Huggingface Hub Permissible models and datasets [Dataset]. https://www.kaggle.com/datasets/dheerajmpai/huggingface-hub-permissible-models-and-datasets

Huggingface Hub Permissible models and datasets

Huggingface Hub models, datasets and spaces.

Explore at:

zip(34761279 bytes)Available download formats

Dataset updated

Dec 26, 2023

Authors

Dheeraj M Pai

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.

Key Features

Comprehensive Data: Includes exhaustive details on all models, datasets, and spaces from the Huggingface Hub.
Permissible Models: A specialized subset is provided in a separate CSV file, focusing exclusively on models that are permissible for use.
Regularly Updated: The dataset is refreshed weekly to ensure the latest information is always available.

Last Update

Date: December 26, 2023

Update Frequency

Frequency: Weekly

Dataset Contents

Models: Detailed listings of all models available on Huggingface Hub.
Datasets: Comprehensive information on datasets hosted on the Hub.
Spaces: An overview of the different spaces and their functionalities.
Permissible Models CSV: A smaller, curated list of models that are cleared for use.

Usage

This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.

Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.

Clear search

Close search

Google apps

Main menu

Huggingface Hub Permissible models and datasets

Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

Key Features

Last Update

Update Frequency

Dataset Contents

Usage

wds_flickr8k

fineweb

huggingface-hub-classes-chunks

Huggingface Hub 0.27.0

Dataset

Contents

Image Caption Models from HuggingFace Hub

Hugging Face Models

Contex

Data Source Link: huggingface.co/models

Attribute Information

dataset_cards_with_metadata

huggingface-hub-docs-chunks-test

documentation-images

hub-stats

hub-tldr-dataset-summaries-llama

SocialCounterfactuals

hub_models_with_base_model_info

codeparrot

the_cauldron

webui-7k

pubmed-abstract

mini-imdb

dev-push-to-hub

Huggingface Hub Permissible models and datasets

Huggingface Hub models, datasets and spaces.

Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

Key Features

Last Update

Update Frequency

Dataset Contents

Usage