100+ datasets found
  1. Hugging Face Models Dataset

    • kaggle.com
    zip
    Updated Feb 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasir Raza (2023). Hugging Face Models Dataset [Dataset]. https://www.kaggle.com/datasets/yasirabdaali/hugging-face-models-dataset
    Explore at:
    zip(980916 bytes)Available download formats
    Dataset updated
    Feb 19, 2023
    Authors
    Yasir Raza
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Hugging Face

    Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets.

    This dataset contains the data of 16k models available on huggingface.co. This dataset contains the following features of the model; 1. model url 2. model title 3. downloads and likes 4. updated

  2. data

    • huggingface.co
    Updated Jul 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle MAP (2025). data [Dataset]. https://huggingface.co/datasets/kaggle-map/data
    Explore at:
    Dataset updated
    Jul 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kaggle MAP
    Description

    kaggle-map/data dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. huggingface-deberta-v3-variants

    • kaggle.com
    zip
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wilmer E. Henao (2022). huggingface-deberta-v3-variants [Dataset]. https://www.kaggle.com/datasets/verracodeguacas/huggingfacedebertav3variants
    Explore at:
    zip(11818652754 bytes)Available download formats
    Dataset updated
    Nov 30, 2022
    Authors
    Wilmer E. Henao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains language models released by Microsoft and fetched from the huggingface ๐Ÿ’ช

    You will find the standard models and also some models that I have found to be useful in different NLP competitions. Feel free to run one model first the standard and running it on a derived model to create an ensemble. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F820904%2F42675d03c4810c2d3f0a6fdb07606166%2Fdownload.png?generation=1669773338137238&alt=media" alt="">

    Includes standard models: - deberta-v3-base - deberta-v3-large - deberta-v3-small - deberta-v3-xsmall - mdeberta-v3-base (multi-language)

    and a couple of fine-tuned models: - deberta-v3-base-squad2 - khalidalt-deberta-v3-mnli - yevheniimaslov-deberta-v3-base-cola - deberta-v3-large-ontonotes5

  4. huggingface_hub

    • kaggle.com
    zip
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Thakur (2024). huggingface_hub [Dataset]. https://www.kaggle.com/datasets/abhishek/huggingface-hub
    Explore at:
    zip(4315332 bytes)Available download formats
    Dataset updated
    Nov 4, 2024
    Authors
    Abhishek Thakur
    Description

    Dataset

    This dataset was created by Abhishek Thakur

    Contents

  5. h

    kaggle-beans

    • huggingface.co
    Updated Sep 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    merve (2022). kaggle-beans [Dataset]. https://huggingface.co/datasets/merve/kaggle-beans
    Explore at:
    Dataset updated
    Sep 25, 2022
    Authors
    merve
    Description

    merve/kaggle-beans dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. issues-kaggle-notebooks

    • huggingface.co
    Updated Aug 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face Smol Models Research (2025). issues-kaggle-notebooks [Dataset]. https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks
    Explore at:
    Dataset updated
    Aug 12, 2025
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face Smol Models Research
    Description

    GitHub Issues & Kaggle Notebooks

      Description
    

    GitHub Issues & Kaggle Notebooks is a collection of two code datasets intended for language models training, they are sourced from GitHub issues and notebooks in Kaggle platform. These datasets are a modified part of the StarCoder2 model training corpus, precisely the bigcode/StarCoder2-Extras dataset. We reformat the samples to remove StarCoder2's special tokens and use natural text to delimit comments in issues and displayโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks.

  7. hugging face optimum

    • kaggle.com
    zip
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Broad (2025). hugging face optimum [Dataset]. https://www.kaggle.com/datasets/nbroad/hugging-face-optimum
    Explore at:
    zip(247588 bytes)Available download formats
    Dataset updated
    Oct 30, 2025
    Authors
    Nicholas Broad
    Description

    Hugging Face Optimum ๐Ÿค— Optimum is an extension of ๐Ÿค— Transformers, providing a set of performance optimization tools enabling maximum efficiency to train and run models on targeted hardware.

    The AI ecosystem evolves quickly and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables users to efficiently use any of these platforms with the same ease inherent to transformers.

    Photo by Anton Filatov on Unsplash

  8. ๐Ÿค— Complete HuggingFace model catalogue

    • kaggle.com
    zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). ๐Ÿค— Complete HuggingFace model catalogue [Dataset]. https://www.kaggle.com/datasets/bwandowando/huggingface-co-model-catalogue
    Explore at:
    zip(14057489 bytes)Available download formats
    Dataset updated
    Apr 17, 2024
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Hugging Face, Inc. is a French-American company and open-source community that develops tools and resources to build, deploy, and train machine learning models.

    I've always been intrigued on how many models do they have, and how much these are utilized which can be somewhat measured in the form of model downloads and likes.

    This dataset contains scraped information from their models page

    Note

    Header image generated using Microsoft Bing Image Creator with this prompt

    Huggingface logo in cute ghost costume doing trick or treat holding a pumpkin of machine learning models while Yan Lecunn and Andrew Ng are looking

  9. h

    Kaggle-Titanic

    • huggingface.co
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ensalada (2023). Kaggle-Titanic [Dataset]. https://huggingface.co/datasets/Tomate/Kaggle-Titanic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 2, 2023
    Authors
    Ensalada
    Description

    Tomate/Kaggle-Titanic dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    kaggle-comp-daigt

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunij Karki (2024). kaggle-comp-daigt [Dataset]. https://huggingface.co/datasets/Yunij/kaggle-comp-daigt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Authors
    Yunij Karki
    Description

    Yunij/kaggle-comp-daigt dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. Huggingface Modelhub

    • kaggle.com
    zip
    Updated Jun 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kartik Godawat (2021). Huggingface Modelhub [Dataset]. https://www.kaggle.com/crazydiv/huggingface-modelhub
    Explore at:
    zip(2274876 bytes)Available download formats
    Dataset updated
    Jun 19, 2021
    Authors
    Kartik Godawat
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg" alt="HuggingFace">

    Dataset containing metadata information of all the publicly uploaded models(10,000+) available on HuggingFace model hub Data was collected between 15-20th June 2021.

    Dataset was generated using huggingface_hub APIs provided by huggingface team.

    Update v3:

    • Added Downloads last month metric
    • Added library name

    Contents:

    • huggingface_models.csv : Primary file which contains metadata information like model name, tags, last modified and filenames
    • huggingface_modelcard_readme.csv : Detailed file containing README.md contents if available for a particular model. Content is in markdown format. modelId column joins both the files together. ### huggingface_models.csv
    • modelId: ID of the model as present on HF website
    • lastModified: Time when this model was last modified
    • tags: Tags associated with the model (provided by mantainer)
    • pipeline_tag: If exists, denotes which pipeline this model could be used with
    • files: List of available files in the model repo
    • publishedBy: Custom column derived from modelID, specifying who published this model
    • downloads_last_month: Number of times the model has been downloaded in last month.
    • library: Name of library the model belongs to eg: transformers, spacy, timm etc. ### huggingface_modelcard_readme.csv
    • modelId: ID of the model as available on HF website
    • modelCard: Readme contents of a model (referred to as modelCard in HuggingFace ecoystem). It contains useful information on how the model was trained, benchmarks and author notes. ### Inspiration: The idea of analyzing publicly available models on HugginFace struck me while I was attending a livesession of the amazing transformers course by @LysandreJik. Soon after, I tweeted the team and asked for permission to create such a dataset. Special shoutout to @osanseviero for encouraging and pointing me in the right direction.

    This is my first dataset upload on Kaggle. I hope you like it. :)

  12. Huggingface RoBERTa

    • kaggle.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Singh (2023). Huggingface RoBERTa [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-roberta
    Explore at:
    zip(34531447596 bytes)Available download formats
    Dataset updated
    Aug 4, 2023
    Authors
    Darius Singh
    Description

    This dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.

    By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

    For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.

    Usage

    To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    โ€‹
    MODEL_DIR = "/kaggle/input/huggingface-roberta/"
    โ€‹
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base")
    model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")
    

    Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.

  13. Hugging Face Getting Started With Embeddings

    • kaggle.com
    zip
    Updated Oct 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nestor Mamani (2023). Hugging Face Getting Started With Embeddings [Dataset]. https://www.kaggle.com/datasets/nestorxyz/hugging-face-getting-started-with-embeddings
    Explore at:
    zip(5996 bytes)Available download formats
    Dataset updated
    Oct 8, 2023
    Authors
    Nestor Mamani
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Nestor Mamani

    Released under Database: Open Database, Contents: Database Contents

    Contents

  14. Data from: hugging face datasets

    • kaggle.com
    zip
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Broad (2025). hugging face datasets [Dataset]. https://www.kaggle.com/nbroad/hf-ds
    Explore at:
    zip(70163997 bytes)Available download formats
    Dataset updated
    Nov 3, 2025
    Authors
    Nicholas Broad
    Description

    This is the latest version of Hugging Face datasets to be used in offline notebooks on Kaggle. It is automatically updated every week.

    Docs are here

    Installation Instructions

    !pip install datasets --no-index --find-links=file:///kaggle/input/hf-ds -U -q

  15. google/flan-t5-large

    • kaggle.com
    zip
    Updated Jul 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    d0rj_ (2023). google/flan-t5-large [Dataset]. https://www.kaggle.com/datasets/d0rj3228/googleflan-t5-large
    Explore at:
    zip(23751646406 bytes)Available download formats
    Dataset updated
    Jul 14, 2023
    Authors
    d0rj_
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Info

    Source repo is google/flan-t5-large.

    Usage

    1. Add dataset to Kaggle notebook;
    2. Import pretrained from folder;
    from transformers import AutoTokenizer, AutoModel
    
    
    model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
    tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
    
    
  16. h

    test-dataset-kaggle

    • huggingface.co
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gholamreza Dar (2024). test-dataset-kaggle [Dataset]. https://huggingface.co/datasets/Gholamreza/test-dataset-kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 15, 2024
    Authors
    Gholamreza Dar
    Description

    Gholamreza/test-dataset-kaggle dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. huggingface_datasets

    • kaggle.com
    zip
    Updated Aug 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleksandr Sirenko (2021). huggingface_datasets [Dataset]. https://www.kaggle.com/oleksandrsirenko/huggingface-datasets
    Explore at:
    zip(52448373 bytes)Available download formats
    Dataset updated
    Aug 14, 2021
    Authors
    Oleksandr Sirenko
    Description

    Dataset

    This dataset was created by Oleksandr Sirenko

    Contents

    It contains the following files:

  18. Huggingface Google MobileBERT

    • kaggle.com
    zip
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Singh (2023). Huggingface Google MobileBERT [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-google-mobilebert
    Explore at:
    zip(875319161 bytes)Available download formats
    Dataset updated
    Jul 26, 2023
    Authors
    Darius Singh
    Description

    This dataset contains different variants of the MobileBERT model by Google available on Hugging Face's model repository.

    By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

    For more information on usage visit the mobilebert hugging face docs.

    Usage

    To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    โ€‹
    MODEL_DIR = "/kaggle/input/huggingface-google-mobilebert/"
    โ€‹
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
    model = AutoModelForPreTraining.from_pretrained(MODEL_DIR)
    

    Acknowledgements All the copyrights and IP relating to MobileBERT belong to the original authors (Sun et al.) and Google. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.

  19. Labelled Corpus - Political Bias (Hugging Face)

    • kaggle.com
    zip
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraj Karakulath (2024). Labelled Corpus - Political Bias (Hugging Face) [Dataset]. https://www.kaggle.com/datasets/surajkarakulath/labelled-corpus-political-bias-hugging-face
    Explore at:
    zip(50133530 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    Suraj Karakulath
    Description

    This is a labeled corpus dataset of article text with corresponding political bias obtained from Huggingface. It contains 17,362 articles labeled left, right, or center by the editors of allsides.com. Articles were manually annotated by news editors who were attempting to select representative articles from the left, right and center of each article topic.

  20. huggingface-datasets

    • kaggle.com
    zip
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    miwojc (2022). huggingface-datasets [Dataset]. https://www.kaggle.com/datasets/miwojc/huggingface-datasets
    Explore at:
    zip(51528758 bytes)Available download formats
    Dataset updated
    Apr 20, 2022
    Authors
    miwojc
    Description

    Dataset

    This dataset was created by miwojc

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yasir Raza (2023). Hugging Face Models Dataset [Dataset]. https://www.kaggle.com/datasets/yasirabdaali/hugging-face-models-dataset
Organization logo

Hugging Face Models Dataset

Dataset of the models available on HuggingFace.co

Explore at:
zip(980916 bytes)Available download formats
Dataset updated
Feb 19, 2023
Authors
Yasir Raza
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Hugging Face

Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets.

This dataset contains the data of 16k models available on huggingface.co. This dataset contains the following features of the model; 1. model url 2. model title 3. downloads and likes 4. updated

Search
Clear search
Close search
Google apps
Main menu