45 datasets found

Hugging Face Models

kaggle.com

zip

Updated Nov 28, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

A T M Ragib Raihan (2023). Hugging Face Models [Dataset]. https://www.kaggle.com/datasets/atmragib/hugging-face-models/code

Explore at:

zip(13652285 bytes)Available download formats

Dataset updated

Nov 28, 2023

Authors

A T M Ragib Raihan

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Contex

The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub.

Data Source Link: huggingface.co/models

Attribute Information

Variable	Description
model_id
pipeline	There are total 40 pipelines. To learn more read: Hugging Face Pipeline
downloads
likes
author_id
author_name
author_type	user or organization
author_isPro	Paid user or organization
lastModified	from 2014-08-10 to 2023-11-27

h
ktda-datasets
huggingface.co
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
XavierJiezou (2024). ktda-datasets [Dataset]. https://huggingface.co/datasets/XavierJiezou/ktda-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Authors
XavierJiezou
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
KTDA-Datasets

This dataset card aims to describe the datasets used in the KTDA.

Install

pip install huggingface-hub

Usage

Step 1: Download datasets

huggingface-cli download --repo-type dataset XavierJiezou/ktda-datasets --local-dir data --include grass.zip huggingface-cli download --repo-type dataset XavierJiezou/ktda-datasets --local-dir data --include cloud.zip

Step 2: Extract datasets

unzip grass.zip -d grass unzip cloud.zip -d l8_biome… See the full description on the dataset page: https://huggingface.co/datasets/XavierJiezou/ktda-datasets.
Huggingface Modelhub
kaggle.com
zip
Updated Jun 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kartik Godawat (2021). Huggingface Modelhub [Dataset]. https://www.kaggle.com/crazydiv/huggingface-modelhub
Explore at:
zip(2274876 bytes)Available download formats
Dataset updated
Jun 19, 2021
Authors
Kartik Godawat
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg" alt="HuggingFace">

Dataset containing metadata information of all the publicly uploaded models(10,000+) available on HuggingFace model hub Data was collected between 15-20th June 2021.

Dataset was generated using huggingface_hub APIs provided by huggingface team.

Update v3:

Added Downloads last month metric

Added library name

Contents:

huggingface_models.csv : Primary file which contains metadata information like model name, tags, last modified and filenames

huggingface_modelcard_readme.csv : Detailed file containing README.md contents if available for a particular model. Content is in markdown format. modelId column joins both the files together. ### huggingface_models.csv

modelId: ID of the model as present on HF website

lastModified: Time when this model was last modified

tags: Tags associated with the model (provided by mantainer)

pipeline_tag: If exists, denotes which pipeline this model could be used with

files: List of available files in the model repo

publishedBy: Custom column derived from modelID, specifying who published this model

downloads_last_month: Number of times the model has been downloaded in last month.

library: Name of library the model belongs to eg: transformers, spacy, timm etc. ### huggingface_modelcard_readme.csv

modelId: ID of the model as available on HF website

modelCard: Readme contents of a model (referred to as modelCard in HuggingFace ecoystem). It contains useful information on how the model was trained, benchmarks and author notes. ### Inspiration: The idea of analyzing publicly available models on HugginFace struck me while I was attending a livesession of the amazing transformers course by @LysandreJik. Soon after, I tweeted the team and asked for permission to create such a dataset. Special shoutout to @osanseviero for encouraging and pointing me in the right direction.

This is my first dataset upload on Kaggle. I hope you like it. :)
h
repo_names
huggingface.co
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
git2vec (2023). repo_names [Dataset]. https://huggingface.co/datasets/git2vec/repo_names
Explore at:
Dataset updated
Jul 26, 2023
Dataset authored and provided by
git2vec
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description

This dataset tracks repository name changes over time. Each row represents a unique combination of repository ID and name, with the timestamp of when that name was first observed. Since GitHub allows repository renaming while preserving the internal repository ID, this dataset enables tracking the full naming history of any repository.

Schema

Column Type Description

repo_id int64 GitHub's internal repository identifier

repo_name string Repository… See the full description on the dataset page: https://huggingface.co/datasets/git2vec/repo_names.
google/flan-t5-large
kaggle.com
zip
Updated Jul 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
d0rj_ (2023). google/flan-t5-large [Dataset]. https://www.kaggle.com/datasets/d0rj3228/googleflan-t5-large
Explore at:
zip(23751646406 bytes)Available download formats
Dataset updated
Jul 14, 2023
Authors
d0rj_
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Info

Source repo is google/flan-t5-large.

Usage

Add dataset to Kaggle notebook;

Import pretrained from folder;

from transformers import AutoTokenizer, AutoModel model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large') tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
Huggingface BERT
kaggle.com
zip
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xhlulu (2025). Huggingface BERT [Dataset]. https://www.kaggle.com/xhlulu/huggingface-bert
Explore at:
zip(25978385354 bytes)Available download formats
Dataset updated
Jun 21, 2025
Authors
xhlulu
Description
This dataset contains many popular BERT weights retrieved directly on Hugging Face's model repository, and hosted on Kaggle. It will be automatically updated every month to ensure that the latest version is available to the user. By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook.

The banner was adapted from figures by Jimmy Lin (tweet; slide) released under CC BY 4.0. BERT has an Apache 2.0 license according to the model repository.

Quick Start

To use this dataset, simply attach it the your notebook and specify the path to the dataset. For example:

from transformers import AutoTokenizer, AutoModelForMaskedLM MODEL_DIR = "/kaggle/input/huggingface-bert/" tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "bert-large-uncased") model = AutoModelForMaskedLM.from_pretrained(MODEL_DIR + "bert-large-uncased")

Acknowledgements

All the copyrights and IP relating to BERT belong to the original authors (Devlin et. al 2019) and Google. All copyrights relating to the transformers library belong to Hugging Face. The banner image was created thanks to Jimmy Lin so any modification of this figure should mention the original author and respect the conditions of the license; all copyrights related to the images belong to him.

Some of the models are community created or trained. Please reach out directly to the authors if you have questions regarding licenses and usage.
h
github-r-repos
huggingface.co
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Falbel (2023). github-r-repos [Dataset]. https://huggingface.co/datasets/dfalbel/github-r-repos
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2023
Authors
Daniel Falbel
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
GitHub R repositories dataset

R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path… See the full description on the dataset page: https://huggingface.co/datasets/dfalbel/github-r-repos.
HF FineWeb 2 Dataset
kaggle.com
zip
Updated Jan 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umer Haddii (2025). HF FineWeb 2 Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/fineweb-2-dataset
Explore at:
zip(1224570 bytes)Available download formats
Dataset updated
Jan 28, 2025
Authors
Umer Haddii
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context

FineWeb 2 is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages. For the actual data, please see the HuggingFace repository.

This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.

The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.

In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

The dataset is also listed on HF, here is official HF Page.

"My focus is on sharing this valuable open-source dataset to help AI and ML practitioners easily find resources on Kaggle."

The detailed information about FW2 is listed in README.md file below ↓

Acknowledgement

Hugging Face FW
h
TF-ID-arxiv-papers
huggingface.co
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yifei Hu (2024). TF-ID-arxiv-papers [Dataset]. https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2024
Authors
Yifei Hu
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
TF-ID arXiv papers dataset

This is the dataset for finetuning TF-ID models. It contains about 4,600 images (academic paper pages) with bounding boxes of tables and figures in coco format. The papers are selected from Hugging Face Daily Papers, covering mostly AI/ML/DL related topics. You can use this dataset to reproduce all TF-ID models. All bounding boxes were annotated manually by Yifei Hu

Project Repo

github.com/ai8hyf/TF-ID

Variants

Unzip the… See the full description on the dataset page: https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers.
h
XAMI-dataset
huggingface.co
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elisabeta-Iulia Dima (2024). XAMI-dataset [Dataset]. https://huggingface.co/datasets/iulia-elisa/XAMI-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 26, 2024
Authors
Elisabeta-Iulia Dima
Description
XAMI: XMM-Newton optical Artefact Mapping for astronomical Instance segmentation The Dataset

Check the XAMI model and the XAMI dataset on Github.

Downloading the dataset

using a python script

from huggingface_hub import hf_hub_download

dataset_name = 'xami_dataset' # the dataset name of Huggingface images_dir = '.' # the output directory of the dataset images

hf_hub_download( repo_id="iulia-elisa/XAMI-dataset", # the Huggingface repo ID repo_type='dataset'… See the full description on the dataset page: https://huggingface.co/datasets/iulia-elisa/XAMI-dataset.
h
disease_data
huggingface.co
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tushar Milind Bansod (2025). disease_data [Dataset]. https://huggingface.co/datasets/Tusharbansod108/disease_data
Explore at:
Dataset updated
Mar 19, 2025
Authors
Tushar Milind Bansod
Description
from huggingface_hub import HfApi api = HfApi() api.upload_file( path_or_fileobj="C:\Users\tusha\Desktop\New folder", # Replace with your file path path_in_repo="data.csv", repo_id="Tusharbansod108/disease_data", # Replace with your repo ID repo_type="dataset" )
cross-encoder/nli-distilroberta-base-v2
kaggle.com
zip
Updated Jul 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ehsan (2021). cross-encoder/nli-distilroberta-base-v2 [Dataset]. https://www.kaggle.com/safavieh/crossencodernlidistilrobertabasev2
Explore at:
zip(305434985 bytes)Available download formats
Dataset updated
Jul 20, 2021
Authors
Ehsan
Description
Context

This is a pretrained transformer that is available in transformers module, from huggingface here:

https://huggingface.co/cross-encoder/nli-distilroberta-base

The files in this repository are uploaded from the source from the developers' website:

https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/nli-distilroberta-base-v2.zip

read the readme.md file in huggingface repo for more info: https://huggingface.co/cross-encoder/nli-distilroberta-base/blob/main/README.md

also, take a look into sentence-transformers documentation for more models and usage: https://www.sbert.net/docs/pretrained_models.html

Usage

The model files are relocated in the 0_Transformer folder.

example:

classifier = pipeline("zero-shot-classification", model="../input/crossencodernlidistilrobertabasev2/0_Transformer", )
h
maniskill_assets
huggingface.co
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RLinf (2025). maniskill_assets [Dataset]. https://huggingface.co/datasets/RLinf/maniskill_assets
Explore at:
Dataset updated
Oct 6, 2025
Dataset authored and provided by
RLinf
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
Asset Download

The assets need to be placed into RLinf's ManiSkill environment folder with the name assets.

uv pip install huggingface_hub if you don't have it

cd

You can also use git to clone the repository: cd

License

Our assets are attributed to… See the full description on the dataset page: https://huggingface.co/datasets/RLinf/maniskill_assets.
GLiNER Github Repo
kaggle.com
zip
Updated Oct 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darien Schettler (2025). GLiNER Github Repo [Dataset]. https://www.kaggle.com/dschettler8845/gliner-github-repo
Explore at:
zip(545226 bytes)Available download formats
Dataset updated
Oct 26, 2025
Authors
Darien Schettler
Description
GLiNER : Generalist and Lightweight model for Named Entity Recognition

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.

Paper: https://arxiv.org/abs/2311.08526 (by Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois)

Demo: https://huggingface.co/spaces/tomaarsen/gliner_base

Colab: https://colab.research.google.com/drive/1mhalKWzmfSTqMnR0wQBZvt9-ktTsATHB?usp=sharing

Models Status

📢 Updates

📝 Finetuning notebook is available: examples/finetune.ipynb

🗂 Training dataset preprocessing scripts are now available in the data/ directory, covering both Pile-NER 📚 and NuNER 📘 datasets.

Available Models on Hugging Face

[x] GLiNER-Base (CC BY NC 4.0)

[x] GLiNER-Multi (CC BY NC 4.0)

[x] GLiNER-small (CC BY NC 4.0)

[x] GLiNER-small-v2 (Apache)

[x] GLiNER-medium (CC BY NC 4.0)

[x] GLiNER-medium-v2 (Apache)

[x] GLiNER-large (CC BY NC 4.0)

[x] GLiNER-large-v2 (Apache)

To Release

[ ] ⏳ GLiNER-Multiv2

[ ] ⏳ GLiNER-Sup (trained on mixture of NER datasets)

Area of improvements / research

[ ] Allow longer context (eg. train with long context transformers such as Longformer, LED, etc.)

[ ] Use Bi-encoder (entity encoder and span encoder) allowing precompute entity embeddings

[ ] Filtering mechanism to reduce number of spans before final classification to save memory and computation when the number entity types is large

[ ] Improve understanding of more detailed prompts/instruction, eg. "Find the first name of the person in the text"

[ ] Better loss function: for instance use Focal Loss (see this paper) instead of BCE to handle class imbalance, as some entity types are more frequent than others

[ ] Improve multi-lingual capabilities: train on more languages, and use multi-lingual training data

[ ] Decoding: allow a span to have multiple labels, eg: "Cristiano Ronaldo" is both a "person" and "football player"

[ ] Dynamic thresholding (in model.predict_entities(text, labels, threshold=0.5)): allow the model to predict more entities, or less entities, depending on the context. Actually, the model tend to predict less entities where the entity type or the domain are not well represented in the training data.

[ ] Train with EMAs (Exponential Moving Averages) or merge multiple checkpoints to improve model robustness (see this paper

[ ] Extend the model to relation extraction but need dataset with relation annotations. Our preliminary work ATG.

Installation

To use this model, you must install the GLiNER Python library: !pip install gliner

Usage

Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using GLiNER.from_pretrained and predict entities with predict_entities.

from gliner import GLiNER model = GLiNER.from_pretrained("urchade/gliner_base") text = """ Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards,[note 3] a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player. He has won 33 trophies in his career, including seven league titles, five UEFA Champions Leagues, the UEFA European Championship and the UEFA Nations League. Ronaldo holds the records for most appearances (183), goals (140) and assists (42) in the Champions League, goals in the European Championship (14), international goals (128) and international appearances (205). He is one of the few players to have made over 1,200 professional career appearances, the most by an outfield player, and has scored over 850 offici...
h
cumcm_test
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sxj1024, cumcm_test [Dataset]. https://huggingface.co/datasets/sxj1024/cumcm_test
Explore at:
Authors
sxj1024
Description
Download Dataset

from datasets import load_dataset

1. Specify the dataset's "repository ID"

Replace "your-username/your-dataset-name" with the actual ID of the dataset you want to download

repo_id = "sxj1024/cumcm_test"

2. Call load_dataset()

This will automatically download the data from the Hub (if not cached locally),

then load it into memory (or in streaming mode)

dataset = load_dataset(repo_id)

3. View and use the dataset

print(dataset)

You can access… See the full description on the dataset page: https://huggingface.co/datasets/sxj1024/cumcm_test.
RemBERT PyTorch
kaggle.com
zip
Updated Aug 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Broad (2021). RemBERT PyTorch [Dataset]. https://www.kaggle.com/nbroad/remBERT-pt
Explore at:
zip(2143586380 bytes)Available download formats
Dataset updated
Aug 24, 2021
Authors
Nicholas Broad
Description
REQUIRES transformers>=4.10.0

Use this dataset and run !pip install -U --no-build-isolation --no-deps ../input/transformers-master/ -qq or do !pip install -U transformers

RemBERT (for classification)

Pretrained RemBERT model on 110 languages using a masked language modeling (MLM) objective. It was introduced in the paper Rethinking embedding coupling in pre-trained language models. A direct export of the model checkpoint was first made available in this repository. This version of the checkpoint is lightweight since it is meant to be finetuned for classification and excludes the output embedding weights.

Model description

RemBERT's main difference with mBERT is that the input and output embeddings are not tied. Instead, RemBERT uses small input embeddings and larger output embeddings. This makes the model more efficient since the output embeddings are discarded during fine-tuning. It is also more accurate, especially when reinvesting the input embeddings' parameters into the core model, as is done on RemBERT.

Intended uses & limitations

You should fine-tune this model for your downstream task. It is meant to be a general-purpose model, similar to mBERT. In our paper, we have successfully applied this model to tasks such as classification, question answering, NER, POS-tagging. For tasks such as text generation you should look at models like GPT2.

Training data

The RemBERT model was pretrained on multilingual Wikipedia data over 110 languages. The full language list is on this repository

https://huggingface.co/google/rembert
h
CHARM
huggingface.co
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuze He (2025). CHARM [Dataset]. https://huggingface.co/datasets/hyz317/CHARM
Explore at:
Dataset updated
Sep 26, 2025
Authors
Yuze He
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
CHARM

📃 Paper • 💻 [Github Repo] • 🌐 Project Page

This repository contains the test dataset presented in the paper CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling. CHARM is a novel parametric representation and generative framework for anime hairstyle modeling.

Usage

You can download the files directly from this repository or use the Hugging Face datasets library: from huggingface_hub import hf_hub_download, list_repo_files

Get list… See the full description on the dataset page: https://huggingface.co/datasets/hyz317/CHARM.
Arxiver Dataset
kaggle.com
huggingface.co
zip
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saumya Gupta (2024). Arxiver Dataset [Dataset]. https://www.kaggle.com/datasets/saumyagupta2025/arxiver-dataset/data
Explore at:
zip(873656728 bytes)Available download formats
Dataset updated
Nov 4, 2024
Authors
Saumya Gupta
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Arxiver Dataset

Arxiver consists of 63,357 arXiv papers converted to multi-markdown (.mmd) format. Our dataset includes original arXiv article IDs, titles, abstracts, authors, publication dates, URLs and corresponding markdown files published between January 2023 and October 2023.

We hope our dataset will be useful for various applications such as semantic search, domain specific language modeling, question answering and summarization.

Curation

The Arxiver dataset is created using a neural OCR - Nougat. After OCR processing, we apply custom text processing steps to refine the data. This includes extracting author information, removing reference sections, and performing additional cleaning and formatting. Please refer to our GitHub repo for details.

References

The original articles are maintained by arXiv and copyrighted to the original authors, please refer to the arXiv license information page for details. We release our dataset with a Creative Commons Attribution-Noncommercial-ShareAlike (CC BY-NC-SA 4.0) license, if you use this dataset in your research or project, please cite it as follows:

@misc{acar_arxiver2024, author = {Alican Acar, Alara Dirik, Muhammet Hatipoglu}, title = {ArXiver}, year = {2024}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/datasets/neuralwork/arxiver}} }
h
repo_branches
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
git2vec, repo_branches [Dataset]. https://huggingface.co/datasets/git2vec/repo_branches
Explore at:
Dataset authored and provided by
git2vec
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description

This dataset captures branch creation history across all GitHub repositories. Each row represents a unique combination of repository ID and branch name, with the timestamp of the first observed push to that branch. This enables analysis of branching strategies, feature branch lifecycles, and development workflow patterns.

Schema

Column Type Description

repo_id int64 GitHub's internal repository identifier

branch_name string Name of the branch… See the full description on the dataset page: https://huggingface.co/datasets/git2vec/repo_branches.
h
bsd100-set5-set14
huggingface.co
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
keanteng (2025). bsd100-set5-set14 [Dataset]. https://huggingface.co/datasets/keanteng/bsd100-set5-set14
Explore at:
Dataset updated
Jul 28, 2025
Authors
keanteng
License
https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/
Description
This repo contains BSD100, Set5 and Set14 for super resolution evaluation study. To access the zipped file: from huggingface_hub import hf_hub_download

Replace with the actual repository ID and filename

repo_id = "keanteng/bsd100-set5-set14" filename = "BSD100.zip" # or Set5.zip and Set14.zip

local_filepath = hf_hub_download(repo_id=repo_id, filename=filename) print(f"File downloaded to: {local_filepath}")

Facebook

Twitter

Click to copy link

Link copied

Cite

A T M Ragib Raihan (2023). Hugging Face Models [Dataset]. https://www.kaggle.com/datasets/atmragib/hugging-face-models/code

Hugging Face Models

Listings of public machine learning model repository meta data on Hugging Face

Explore at:

zip(13652285 bytes)Available download formats

Dataset updated

Nov 28, 2023

Authors

A T M Ragib Raihan

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Contex

Data Source Link: huggingface.co/models

Attribute Information

Variable	Description
model_id
pipeline	There are total 40 pipelines. To learn more read: Hugging Face Pipeline
downloads
likes
author_id
author_name
author_type	user or organization
author_isPro	Paid user or organization
lastModified	from 2014-08-10 to 2023-11-27

Clear search

Close search

Google apps

Main menu

Hugging Face Models

Contex

Data Source Link: huggingface.co/models

Attribute Information

ktda-datasets

Step 1: Download datasets

Step 2: Extract datasets

Huggingface Modelhub

Update v3:

Contents:

repo_names

google/flan-t5-large

Info

Usage

Huggingface BERT

Quick Start

Acknowledgements

github-r-repos

HF FineWeb 2 Dataset

Context

Acknowledgement

TF-ID-arxiv-papers

XAMI-dataset

disease_data

cross-encoder/nli-distilroberta-base-v2

Context

Usage

maniskill_assets

uv pip install huggingface_hub if you don't have it

GLiNER Github Repo

GLiNER : Generalist and Lightweight model for Named Entity Recognition

Models Status

📢 Updates

Available Models on Hugging Face

To Release

Area of improvements / research

Installation

Usage

cumcm_test

1. Specify the dataset's "repository ID"

Replace "your-username/your-dataset-name" with the actual ID of the dataset you want to download

2. Call load_dataset()

This will automatically download the data from the Hub (if not cached locally),

then load it into memory (or in streaming mode)

3. View and use the dataset

You can access… See the full description on the dataset page: https://huggingface.co/datasets/sxj1024/cumcm_test.

RemBERT PyTorch

REQUIRES transformers>=4.10.0

RemBERT (for classification)

Model description

Intended uses & limitations

Training data

https://huggingface.co/google/rembert

CHARM

Get list… See the full description on the dataset page: https://huggingface.co/datasets/hyz317/CHARM.

Arxiver Dataset

Arxiver Dataset

Curation

References

repo_branches

bsd100-set5-set14

Replace with the actual repository ID and filename

Hugging Face Models

Listings of public machine learning model repository meta data on Hugging Face

Contex

Data Source Link: huggingface.co/models

Attribute Information