100+ datasets found

h
trending-repos
huggingface.co
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sylvain Lesage (2024). trending-repos [Dataset]. https://huggingface.co/datasets/severo/trending-repos
Explore at:
Dataset updated
Feb 1, 2024
Authors
Sylvain Lesage
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Dataset Name

Dataset Summary

This dataset contains the 20 trending repositories of each type: models, datasets, and space, on Hugging Face, every day. Each type can be loaded from its own dataset config.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Not relevant.

Dataset Structure Data Instances

The dataset contains three configurations: models: the history of trending models on Hugging… See the full description on the dataset page: https://huggingface.co/datasets/severo/trending-repos.

Hugging Face Models

kaggle.com

zip

Updated Nov 28, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

A T M Ragib Raihan (2023). Hugging Face Models [Dataset]. https://www.kaggle.com/datasets/atmragib/hugging-face-models/code

Explore at:

zip(13652285 bytes)Available download formats

Dataset updated

Nov 28, 2023

Authors

A T M Ragib Raihan

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Contex

The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub.

Data Source Link: huggingface.co/models

Attribute Information

Variable	Description
model_id
pipeline	There are total 40 pipelines. To learn more read: Hugging Face Pipeline
downloads
likes
author_id
author_name
author_type	user or organization
author_isPro	Paid user or organization
lastModified	from 2014-08-10 to 2023-11-27

Huggingface RoBERTa
kaggle.com
zip
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darius Singh (2023). Huggingface RoBERTa [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-roberta
Explore at:
zip(34531447596 bytes)Available download formats
Dataset updated
Aug 4, 2023
Authors
Darius Singh
Description
This dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.

By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.

Usage

To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

from transformers import AutoTokenizer, AutoModelForPreTraining MODEL_DIR = "/kaggle/input/huggingface-roberta/" tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base") model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")

Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
h
hub-repo-stats
huggingface.co
Updated Oct 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jared Sulzdorf (2024). hub-repo-stats [Dataset]. https://huggingface.co/datasets/jsulz/hub-repo-stats
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2024
Authors
Jared Sulzdorf
Description
jsulz/hub-repo-stats dataset hosted on Hugging Face and contributed by the HF Datasets community
Huggingface Hub Permissible models and datasets
kaggle.com
zip
Updated Dec 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dheeraj M Pai (2023). Huggingface Hub Permissible models and datasets [Dataset]. https://www.kaggle.com/datasets/dheerajmpai/huggingface-hub-permissible-models-and-datasets
Explore at:
zip(34761279 bytes)Available download formats
Dataset updated
Dec 26, 2023
Authors
Dheeraj M Pai
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.

Key Features

Comprehensive Data: Includes exhaustive details on all models, datasets, and spaces from the Huggingface Hub.

Permissible Models: A specialized subset is provided in a separate CSV file, focusing exclusively on models that are permissible for use.

Regularly Updated: The dataset is refreshed weekly to ensure the latest information is always available.

Last Update

Date: December 26, 2023

Update Frequency

Frequency: Weekly

Dataset Contents

Models: Detailed listings of all models available on Huggingface Hub.

Datasets: Comprehensive information on datasets hosted on the Hub.

Spaces: An overview of the different spaces and their functionalities.

Permissible Models CSV: A smaller, curated list of models that are cleared for use.

Usage

This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.

Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.
Data from: label-files
huggingface.co
Updated Dec 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face (2021). label-files [Dataset]. https://huggingface.co/datasets/huggingface/label-files
Explore at:
Dataset updated
Dec 23, 2021
Dataset authored and provided by
Hugging Facehttps://huggingface.co/
Description
This repository contains the mapping from integer id's to actual label names (in HuggingFace Transformers typically called id2label) for several datasets. Current datasets include:

ImageNet-1k ImageNet-22k (also called ImageNet-21k as there are 21,843 classes) COCO detection 2017 COCO panoptic 2017 ADE20k (actually, the MIT Scene Parsing benchmark, which is a subset of ADE20k) Cityscapes VQAv2 Kinetics-700 RVL-CDIP PASCAL VOC Kinetics-400 ...

You can read in a label file as follows (using… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/label-files.
Huggingface Modelhub
kaggle.com
zip
Updated Jun 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kartik Godawat (2021). Huggingface Modelhub [Dataset]. https://www.kaggle.com/crazydiv/huggingface-modelhub
Explore at:
zip(2274876 bytes)Available download formats
Dataset updated
Jun 19, 2021
Authors
Kartik Godawat
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg" alt="HuggingFace">

Dataset containing metadata information of all the publicly uploaded models(10,000+) available on HuggingFace model hub Data was collected between 15-20th June 2021.

Dataset was generated using huggingface_hub APIs provided by huggingface team.

Update v3:

Added Downloads last month metric

Added library name

Contents:

huggingface_models.csv : Primary file which contains metadata information like model name, tags, last modified and filenames

huggingface_modelcard_readme.csv : Detailed file containing README.md contents if available for a particular model. Content is in markdown format. modelId column joins both the files together. ### huggingface_models.csv

modelId: ID of the model as present on HF website

lastModified: Time when this model was last modified

tags: Tags associated with the model (provided by mantainer)

pipeline_tag: If exists, denotes which pipeline this model could be used with

files: List of available files in the model repo

publishedBy: Custom column derived from modelID, specifying who published this model

downloads_last_month: Number of times the model has been downloaded in last month.

library: Name of library the model belongs to eg: transformers, spacy, timm etc. ### huggingface_modelcard_readme.csv

modelId: ID of the model as available on HF website

modelCard: Readme contents of a model (referred to as modelCard in HuggingFace ecoystem). It contains useful information on how the model was trained, benchmarks and author notes. ### Inspiration: The idea of analyzing publicly available models on HugginFace struck me while I was attending a livesession of the amazing transformers course by @LysandreJik. Soon after, I tweeted the team and asked for permission to create such a dataset. Special shoutout to @osanseviero for encouraging and pointing me in the right direction.

This is my first dataset upload on Kaggle. I hope you like it. :)
h
RTL-Repo
huggingface.co
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Allam (2023). RTL-Repo [Dataset]. https://huggingface.co/datasets/ahmedallam/RTL-Repo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 30, 2023
Authors
Ahmed Allam
Description
RTL-Repo Benchmark

This repository contains the data for the RTL-Repo benchmark introduced in the paper RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects.

👋 Overview

RTL-Repo is a benchmark for evaluating LLMs' effectiveness in generating Verilog code autocompletions within large, complex codebases. It assesses the model's ability to understand and remember the entire Verilog repository context and generate new code that is correct, relevant… See the full description on the dataset page: https://huggingface.co/datasets/ahmedallam/RTL-Repo.
On the Suitability of Hugging Face Hub for Empirical Studies
zenodo.org
recerca.uoc.edu
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adem Ait Fonollà; Adem Ait Fonollà; Javier Luis Cánovas Izquierdo; Javier Luis Cánovas Izquierdo; Jordi Cabot; Jordi Cabot (2024). On the Suitability of Hugging Face Hub for Empirical Studies [Dataset]. http://doi.org/10.5281/zenodo.11072131
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11072131
Dataset updated
Apr 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adem Ait Fonollà; Adem Ait Fonollà; Javier Luis Cánovas Izquierdo; Javier Luis Cánovas Izquierdo; Jordi Cabot; Jordi Cabot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the data used in the paper title "On the Suitability of Hugging Face Hub for Empirical Studies". For RQ1 we share the survey responses and the interview transcription, while for RQ2 we share the link to the repository where the data is hosted.

For RQ1, the survey responses are in an Excel file titled "Survey Responses Public.xlsx". The transcriptions of each interview are in a Word file titled "Transcription-intvw-slot-N.docx".

For RQ2, we collected the data of the HFCommunity release of October 2023. It can be found in its website. We also share the DOI of the dump.
Huggingface Google MobileBERT
kaggle.com
zip
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darius Singh (2023). Huggingface Google MobileBERT [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-google-mobilebert
Explore at:
zip(875319161 bytes)Available download formats
Dataset updated
Jul 26, 2023
Authors
Darius Singh
Description
This dataset contains different variants of the MobileBERT model by Google available on Hugging Face's model repository.

By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

For more information on usage visit the mobilebert hugging face docs.

Usage

To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

from transformers import AutoTokenizer, AutoModelForPreTraining MODEL_DIR = "/kaggle/input/huggingface-google-mobilebert/" tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR) model = AutoModelForPreTraining.from_pretrained(MODEL_DIR)

Acknowledgements All the copyrights and IP relating to MobileBERT belong to the original authors (Sun et al.) and Google. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
HF FineWeb 2 Dataset
kaggle.com
zip
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umer Haddii (2025). HF FineWeb 2 Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/fineweb-2-dataset
Explore at:
zip(1224570 bytes)Available download formats
Dataset updated
Jan 28, 2025
Authors
Umer Haddii
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context

FineWeb 2 is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages. For the actual data, please see the HuggingFace repository.

This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.

The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.

In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

The dataset is also listed on HF, here is official HF Page.

"My focus is on sharing this valuable open-source dataset to help AI and ML practitioners easily find resources on Kaggle."

The detailed information about FW2 is listed in README.md file below ↓

Acknowledgement

Hugging Face FW
h
ktda-datasets
huggingface.co
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XavierJiezou (2024). ktda-datasets [Dataset]. https://huggingface.co/datasets/XavierJiezou/ktda-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Authors
XavierJiezou
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
KTDA-Datasets

This dataset card aims to describe the datasets used in the KTDA.

Install

pip install huggingface-hub

Usage

Step 1: Download datasets

huggingface-cli download --repo-type dataset XavierJiezou/ktda-datasets --local-dir data --include grass.zip huggingface-cli download --repo-type dataset XavierJiezou/ktda-datasets --local-dir data --include cloud.zip

Step 2: Extract datasets

unzip grass.zip -d grass unzip cloud.zip -d l8_biome… See the full description on the dataset page: https://huggingface.co/datasets/XavierJiezou/ktda-datasets.
h
rtl-repo-curated
huggingface.co
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sonya Jin (2024). rtl-repo-curated [Dataset]. https://huggingface.co/datasets/sonyashijin/rtl-repo-curated
Explore at:
Dataset updated
May 30, 2024
Authors
Sonya Jin
Description
sonyashijin/rtl-repo-curated dataset hosted on Hugging Face and contributed by the HF Datasets community
h
test-repo
huggingface.co
Updated Jun 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tyler Ross (2024). test-repo [Dataset]. https://huggingface.co/datasets/tdross/test-repo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2024
Authors
Tyler Ross
Description
tdross/test-repo dataset hosted on Hugging Face and contributed by the HF Datasets community
h
test-repo
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Nguyen, test-repo [Dataset]. https://huggingface.co/datasets/Adanato/test-repo
Explore at:
Authors
Adam Nguyen
Description
Adanato/test-repo dataset hosted on Hugging Face and contributed by the HF Datasets community
h
my-target-repo
huggingface.co
Updated Dec 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fred Elias (2025). my-target-repo [Dataset]. https://huggingface.co/datasets/samfred2/my-target-repo
Explore at:
Dataset updated
Dec 1, 2025
Authors
Fred Elias
Description
samfred2/my-target-repo dataset hosted on Hugging Face and contributed by the HF Datasets community
h
open_data_boamps
huggingface.co
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BoAmps (2025). open_data_boamps [Dataset]. https://huggingface.co/datasets/boavizta/open_data_boamps
Explore at:
Dataset updated
Jul 15, 2025
Dataset authored and provided by
BoAmps
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Guide: How to share your data on the BoAmps repository

This guide explains step by step how to share BoAmps format reports on this public Hugging Face repository.

Prerequisites

Before starting, make sure you have:

A Hugging Face account The files you want to upload

Method 1: Hugging Face Web Interface

Log in to Hugging Face

Go to the boamps dataset

Navigate to the files: Click on "Files and versions" then on the "data" folder

Click on "Contribute" then… See the full description on the dataset page: https://huggingface.co/datasets/boavizta/open_data_boamps.
h
repo
huggingface.co
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Stone (2025). repo [Dataset]. https://huggingface.co/datasets/tstone87/repo
Explore at:
Dataset updated
Feb 2, 2025
Authors
Thomas Stone
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
tstone87/repo dataset hosted on Hugging Face and contributed by the HF Datasets community
h
HISTAI-metadata
huggingface.co
Updated Jun 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HistAI (2025). HISTAI-metadata [Dataset]. https://huggingface.co/datasets/histai/HISTAI-metadata
Explore at:
Dataset updated
Jun 2, 2025
Dataset authored and provided by
HistAI
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dear researchers and engineers, you're accessing a dataset that would cost millions of dollars to build and took millions of nerves to negotiate favorable terms for its use. Your support, by liking the repositories and upvoting the collection, costs nothing but gives us valuable motivation to continue our contributions to the community. We reserve the right not to approve the request if you don't support our efforts. Thank you very much for collaboration!

HISTAI Dataset

HISTAI is a… See the full description on the dataset page: https://huggingface.co/datasets/histai/HISTAI-metadata.
h
gxucpc-repo-v1
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CHengtao Sun, gxucpc-repo-v1 [Dataset]. https://huggingface.co/datasets/cityTS/gxucpc-repo-v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
CHengtao Sun
Description
cityTS/gxucpc-repo-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Sylvain Lesage (2024). trending-repos [Dataset]. https://huggingface.co/datasets/severo/trending-repos

trending-repos

severo/trending-repos

Trending repositories on Hugging Face

Explore at:

107 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 1, 2024

Authors

Sylvain Lesage

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for Dataset Name

  Dataset Summary

This dataset contains the 20 trending repositories of each type: models, datasets, and space, on Hugging Face, every day. Each type can be loaded from its own dataset config.

  Supported Tasks and Leaderboards

[More Information Needed]

  Languages

Not relevant.

  Dataset Structure





  Data Instances

The dataset contains three configurations: models: the history of trending models on Hugging… See the full description on the dataset page: https://huggingface.co/datasets/severo/trending-repos.

Clear search

Close search

Google apps

Main menu

trending-repos

Hugging Face Models

Contex

Data Source Link: huggingface.co/models

Attribute Information

Huggingface RoBERTa

hub-repo-stats

Huggingface Hub Permissible models and datasets

Huggingface Hub: Models, Datasets, and Spaces

Dataset Overview

Key Features

Last Update

Update Frequency

Dataset Contents

Usage

Data from: label-files

Huggingface Modelhub

Update v3:

Contents:

RTL-Repo

On the Suitability of Hugging Face Hub for Empirical Studies

Huggingface Google MobileBERT

HF FineWeb 2 Dataset

Context

Acknowledgement

ktda-datasets

Step 1: Download datasets

Step 2: Extract datasets

rtl-repo-curated

test-repo

test-repo

my-target-repo

open_data_boamps

repo

HISTAI-metadata

gxucpc-repo-v1

trending-repos

severo/trending-repos

Trending repositories on Hugging Face