Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.
This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.
Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.
Facebook
Twitterclip-benchmark/wds_flickr8k dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
Facebook
Twitternielsr/huggingface-hub-classes-chunks dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Aishik Rakshit
Released under Apache 2.0
Facebook
Twitterblip-image-captioning-base https://huggingface.co/Salesforce/blip-image-captioning-base blip-image-captioning-large https://huggingface.co/Salesforce/blip-image-captioning-large blip2-flan-t5-xl https://huggingface.co/Salesforce/blip2-flan-t5-xl blip2-opt-2.7b https://huggingface.co/Salesforce/blip2-opt-2.7b git-base https://huggingface.co/microsoft/git-base git-base-coco https://huggingface.co/microsoft/git-base-coco git-large-coco https://huggingface.co/microsoft/git-large-coco git-large-r https://huggingface.co/microsoft/git-large-r image-caption-generator https://huggingface.co/bipin/image-caption-generator image_caption https://huggingface.co/jaimin/image_caption vit-gpt2-coco-en https://huggingface.co/ydshieh/vit-gpt2-coco-en vit-gpt2-image-captioning https://huggingface.co/nlpconnect/vit-gpt2-image-captioning vit-swin-base-224-gpt2-image-captioning https://huggingface.co/Abdou/vit-swin-base-224-gpt2-image-captioning
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub.
| Variable | Description |
|---|---|
| model_id | |
| pipeline | There are total 40 pipelines. To learn more read: Hugging Face Pipeline |
| downloads | |
| likes | |
| author_id | |
| author_name | |
| author_type | user or organization |
| author_isPro | Paid user or organization |
| lastModified | from 2014-08-10 to 2023-11-27 |
Facebook
TwitterDataset Card for Hugging Face Hub Dataset Cards
This datasets consists of dataset cards for models hosted on the Hugging Face Hub. The dataset cards are created by the community and provide information about datasets hosted on the Hugging Face Hub. This dataset is updated on a daily basis and includes publicly available datasets on the Hugging Face Hub. This dataset is made available to help support users wanting to work with a large number of Dataset Cards from the Hub. We hope… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/dataset_cards_with_metadata.
Facebook
Twitternielsr/huggingface-hub-docs-chunks-test dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains images used in the documentation of HuggingFace's libraries.
HF Team: Please make sure you optimize the assets before uploading them. My favorite tool for this is https://tinypng.com/.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Changelog
NEW Changes July 25th
added baseModels field to models which shows the models that the user tagged as base models for that model
Example: { "models": [ { "_id": "687de260234339fed21e768a", "id": "Qwen/Qwen3-235B-A22B-Instruct-2507" } ], "relation": "quantized" }
NEW Changes July 9th
Fixed issue with gguf column with integer overflow causing import pipeline to be broken over a few weeks ✅
NEW Changes Feb 27th
Added new fields on the models… See the full description on the dataset page: https://huggingface.co/datasets/cfahlgren1/hub-stats.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset card for dataset-summaries-llama
This dataset contains AI-generated summaries of dataset cards from the Hugging Face Hub, generated using meta-llama/Llama-3.3-70B-Instruct. It is designed to be used in combination with a similar dataset of model card summaries for initial supervised fine-tuning (SFT) of language models specialized in generating tl;dr summaries of dataset and model cards from the Hugging Face Hub. This dataset was made with Curator.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/davanstrien/hub-tldr-dataset-summaries-llama.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Intel/SocialCounterfactuals dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Card for Hugging Face Hub Models with Base Model Metadata
Dataset Details
This dataset contains a subset of possible metadata for models hosted on the Hugging Face Hub. All of these models contain base_model metadata i.e. information about the model used for fine-tuning. This data can be used for creating network graphs showing links between models on the Hub.
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More… See the full description on the dataset page: https://huggingface.co/datasets/librarian-bots/hub_models_with_base_model_info.
Facebook
TwitterCodeParrot 🦜 Dataset
What is it?
This is the full CodeParrot dataset. It contains Python files used to train the code generation model in Chapter 10: Training Transformers from Scratch in the NLP with Transformers book. You can find the full code in the accompanying Github repository.
Creation
It was created with the GitHub dataset available via Google's BigQuery. It contains approximately 22 million Python files and is 180 GB (50 GB compressed) big. The… See the full description on the dataset page: https://huggingface.co/datasets/transformersbook/codeparrot.
Facebook
TwitterDataset Card for The Cauldron
Dataset description
The Cauldron is part of the Idefics2 release. It is a massive collection of 50 vision-language datasets (training sets only) that were used for the fine-tuning of the vision-language model Idefics2.
Load the dataset
To load the dataset, install the library datasets with pip install datasets. Then, from datasets import load_dataset ds = load_dataset("HuggingFaceM4/the_cauldron", "ai2d")
to download and load the… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceM4/the_cauldron.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
This data accompanies the WebUI project (https://dl.acm.org/doi/abs/10.1145/3544548.3581158) For more information, check out the project website: https://uimodeling.github.io/ To download this dataset, you need to install the huggingface-hub package pip install huggingface-hub
Use snapshot_download from huggingface_hub import snapshot_download snapshot_download(repo_id="biglab/webui-7k", repo_type="dataset")
IMPORTANT
Before downloading and using, please review the copyright info here:… See the full description on the dataset page: https://huggingface.co/datasets/biglab/webui-7k.
Facebook
TwitterDataset Summary
A daily-updated dataset of PubMed abstracts, collected via PubMed’s API and published on Hugging Face Datasets.Each snapshot is versioned by date (e.g., 2025-03-28) so users can track historical changes or use a consistent snapshot for reproducibility.
Updated daily Each version tagged by date Abstract-only dataset (no full text)
Dataset Structure
Column Type Description
pmid string Unique PubMed identifier
abstract string Abstract text… See the full description on the dataset page: https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract.
Facebook
Twitterdvilasuero/mini-imdb dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterashim/dev-push-to-hub dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.
This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.
Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.