100+ datasets found
  1. h

    trending-repos

    • huggingface.co
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sylvain Lesage (2024). trending-repos [Dataset]. https://huggingface.co/datasets/severo/trending-repos
    Explore at:
    Dataset updated
    Feb 1, 2024
    Authors
    Sylvain Lesage
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Dataset Name

      Dataset Summary
    

    This dataset contains the 20 trending repositories of each type: models, datasets, and space, on Hugging Face, every day. Each type can be loaded from its own dataset config.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    Not relevant.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    The dataset contains three configurations: models: the history of trending models on Hugging… See the full description on the dataset page: https://huggingface.co/datasets/severo/trending-repos.

  2. Hugging Face Models

    • kaggle.com
    zip
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A T M Ragib Raihan (2023). Hugging Face Models [Dataset]. https://www.kaggle.com/datasets/atmragib/hugging-face-models/code
    Explore at:
    zip(13652285 bytes)Available download formats
    Dataset updated
    Nov 28, 2023
    Authors
    A T M Ragib Raihan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Contex

    The Hugging Face Hub hosts many models for a variety of machine learning tasks. Models are stored in repositories, so they benefit from all the features possessed by every repo on the Hugging Face Hub.

    Data Source Link: huggingface.co/models

    Attribute Information

    VariableDescription
    model_id
    pipelineThere are total 40 pipelines. To learn more read: Hugging Face Pipeline
    downloads
    likes
    author_id
    author_name
    author_typeuser or organization
    author_isProPaid user or organization
    lastModifiedfrom 2014-08-10 to 2023-11-27
  3. Huggingface RoBERTa

    • kaggle.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Singh (2023). Huggingface RoBERTa [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-roberta
    Explore at:
    zip(34531447596 bytes)Available download formats
    Dataset updated
    Aug 4, 2023
    Authors
    Darius Singh
    Description

    This dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.

    By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

    For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.

    Usage

    To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    ​
    MODEL_DIR = "/kaggle/input/huggingface-roberta/"
    ​
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base")
    model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")
    

    Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.

  4. h

    hub-repo-stats

    • huggingface.co
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jared Sulzdorf (2024). hub-repo-stats [Dataset]. https://huggingface.co/datasets/jsulz/hub-repo-stats
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2024
    Authors
    Jared Sulzdorf
    Description

    jsulz/hub-repo-stats dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Huggingface Hub Permissible models and datasets

    • kaggle.com
    zip
    Updated Dec 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dheeraj M Pai (2023). Huggingface Hub Permissible models and datasets [Dataset]. https://www.kaggle.com/datasets/dheerajmpai/huggingface-hub-permissible-models-and-datasets
    Explore at:
    zip(34761279 bytes)Available download formats
    Dataset updated
    Dec 26, 2023
    Authors
    Dheeraj M Pai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Huggingface Hub: Models, Datasets, and Spaces

    Dataset Overview

    This comprehensive dataset contains detailed information about all the models, datasets, and spaces available on the Huggingface Hub. It is an essential resource for anyone looking to explore the extensive range of tools and datasets available for machine learning and AI research.

    Key Features

    • Comprehensive Data: Includes exhaustive details on all models, datasets, and spaces from the Huggingface Hub.
    • Permissible Models: A specialized subset is provided in a separate CSV file, focusing exclusively on models that are permissible for use.
    • Regularly Updated: The dataset is refreshed weekly to ensure the latest information is always available.

    Last Update

    • Date: December 26, 2023

    Update Frequency

    • Frequency: Weekly

    Dataset Contents

    1. Models: Detailed listings of all models available on Huggingface Hub.
    2. Datasets: Comprehensive information on datasets hosted on the Hub.
    3. Spaces: An overview of the different spaces and their functionalities.
    4. Permissible Models CSV: A smaller, curated list of models that are cleared for use.

    Usage

    This dataset is ideal for researchers, developers, and AI enthusiasts who are looking for a one-stop repository of models, datasets, and spaces from the Huggingface Hub. It provides a holistic view and simplifies the task of finding the right tools for various machine learning and AI projects.

    Note: This dataset is not officially affiliated with or endorsed by the Huggingface organization.

  6. Data from: label-files

    • huggingface.co
    Updated Dec 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face (2021). label-files [Dataset]. https://huggingface.co/datasets/huggingface/label-files
    Explore at:
    Dataset updated
    Dec 23, 2021
    Dataset authored and provided by
    Hugging Facehttps://huggingface.co/
    Description

    This repository contains the mapping from integer id's to actual label names (in HuggingFace Transformers typically called id2label) for several datasets. Current datasets include:

    ImageNet-1k ImageNet-22k (also called ImageNet-21k as there are 21,843 classes) COCO detection 2017 COCO panoptic 2017 ADE20k (actually, the MIT Scene Parsing benchmark, which is a subset of ADE20k) Cityscapes VQAv2 Kinetics-700 RVL-CDIP PASCAL VOC Kinetics-400 ...

    You can read in a label file as follows (using… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/label-files.

  7. Huggingface Modelhub

    • kaggle.com
    zip
    Updated Jun 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kartik Godawat (2021). Huggingface Modelhub [Dataset]. https://www.kaggle.com/crazydiv/huggingface-modelhub
    Explore at:
    zip(2274876 bytes)Available download formats
    Dataset updated
    Jun 19, 2021
    Authors
    Kartik Godawat
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg" alt="HuggingFace">

    Dataset containing metadata information of all the publicly uploaded models(10,000+) available on HuggingFace model hub Data was collected between 15-20th June 2021.

    Dataset was generated using huggingface_hub APIs provided by huggingface team.

    Update v3:

    • Added Downloads last month metric
    • Added library name

    Contents:

    • huggingface_models.csv : Primary file which contains metadata information like model name, tags, last modified and filenames
    • huggingface_modelcard_readme.csv : Detailed file containing README.md contents if available for a particular model. Content is in markdown format. modelId column joins both the files together. ### huggingface_models.csv
    • modelId: ID of the model as present on HF website
    • lastModified: Time when this model was last modified
    • tags: Tags associated with the model (provided by mantainer)
    • pipeline_tag: If exists, denotes which pipeline this model could be used with
    • files: List of available files in the model repo
    • publishedBy: Custom column derived from modelID, specifying who published this model
    • downloads_last_month: Number of times the model has been downloaded in last month.
    • library: Name of library the model belongs to eg: transformers, spacy, timm etc. ### huggingface_modelcard_readme.csv
    • modelId: ID of the model as available on HF website
    • modelCard: Readme contents of a model (referred to as modelCard in HuggingFace ecoystem). It contains useful information on how the model was trained, benchmarks and author notes. ### Inspiration: The idea of analyzing publicly available models on HugginFace struck me while I was attending a livesession of the amazing transformers course by @LysandreJik. Soon after, I tweeted the team and asked for permission to create such a dataset. Special shoutout to @osanseviero for encouraging and pointing me in the right direction.

    This is my first dataset upload on Kaggle. I hope you like it. :)

  8. h

    RTL-Repo

    • huggingface.co
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Allam (2023). RTL-Repo [Dataset]. https://huggingface.co/datasets/ahmedallam/RTL-Repo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 30, 2023
    Authors
    Ahmed Allam
    Description

    RTL-Repo Benchmark

    This repository contains the data for the RTL-Repo benchmark introduced in the paper RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects.

      👋 Overview
    

    RTL-Repo is a benchmark for evaluating LLMs' effectiveness in generating Verilog code autocompletions within large, complex codebases. It assesses the model's ability to understand and remember the entire Verilog repository context and generate new code that is correct, relevant… See the full description on the dataset page: https://huggingface.co/datasets/ahmedallam/RTL-Repo.

  9. On the Suitability of Hugging Face Hub for Empirical Studies

    • zenodo.org
    • recerca.uoc.edu
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adem Ait Fonollà; Adem Ait Fonollà; Javier Luis Cánovas Izquierdo; Javier Luis Cánovas Izquierdo; Jordi Cabot; Jordi Cabot (2024). On the Suitability of Hugging Face Hub for Empirical Studies [Dataset]. http://doi.org/10.5281/zenodo.11072131
    Explore at:
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Adem Ait Fonollà; Adem Ait Fonollà; Javier Luis Cánovas Izquierdo; Javier Luis Cánovas Izquierdo; Jordi Cabot; Jordi Cabot
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the data used in the paper title "On the Suitability of Hugging Face Hub for Empirical Studies". For RQ1 we share the survey responses and the interview transcription, while for RQ2 we share the link to the repository where the data is hosted.

    • For RQ1, the survey responses are in an Excel file titled "Survey Responses Public.xlsx". The transcriptions of each interview are in a Word file titled "Transcription-intvw-slot-N.docx".
    • For RQ2, we collected the data of the HFCommunity release of October 2023. It can be found in its website. We also share the DOI of the dump.
  10. Huggingface Google MobileBERT

    • kaggle.com
    zip
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Singh (2023). Huggingface Google MobileBERT [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-google-mobilebert
    Explore at:
    zip(875319161 bytes)Available download formats
    Dataset updated
    Jul 26, 2023
    Authors
    Darius Singh
    Description

    This dataset contains different variants of the MobileBERT model by Google available on Hugging Face's model repository.

    By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

    For more information on usage visit the mobilebert hugging face docs.

    Usage

    To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    ​
    MODEL_DIR = "/kaggle/input/huggingface-google-mobilebert/"
    ​
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
    model = AutoModelForPreTraining.from_pretrained(MODEL_DIR)
    

    Acknowledgements All the copyrights and IP relating to MobileBERT belong to the original authors (Sun et al.) and Google. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.

  11. HF FineWeb 2 Dataset

    • kaggle.com
    zip
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umer Haddii (2025). HF FineWeb 2 Dataset [Dataset]. https://www.kaggle.com/datasets/umerhaddii/fineweb-2-dataset
    Explore at:
    zip(1224570 bytes)Available download formats
    Dataset updated
    Jan 28, 2025
    Authors
    Umer Haddii
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context

    FineWeb 2 is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages. For the actual data, please see the HuggingFace repository.

    This is the second iteration of the popular 🍷 FineWeb dataset, bringing high quality pretraining data to over 1000 🗣️ languages.

    The 🥂 FineWeb2 dataset is fully reproducible, available under the permissive ODC-By 1.0 license and extensively validated through hundreds of ablation experiments.

    In particular, on the set of 9 diverse languages we used to guide our processing decisions, 🥂 FineWeb2 outperforms other popular pretraining datasets covering multiple languages (such as CC-100, mC4, CulturaX or HPLT, while being substantially larger) and, in some cases, even performs better than some datasets specifically curated for a single one of these languages, in our diverse set of carefully selected evaluation tasks: FineTasks.

    The dataset is also listed on HF, here is official HF Page.

    "My focus is on sharing this valuable open-source dataset to help AI and ML practitioners easily find resources on Kaggle."

    The detailed information about FW2 is listed in README.md file below ↓

    Acknowledgement

    Hugging Face FW

  12. h

    ktda-datasets

    • huggingface.co
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    XavierJiezou (2024). ktda-datasets [Dataset]. https://huggingface.co/datasets/XavierJiezou/ktda-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2024
    Authors
    XavierJiezou
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    KTDA-Datasets

    This dataset card aims to describe the datasets used in the KTDA.

      Install
    

    pip install huggingface-hub

      Usage
    

    Step 1: Download datasets

    huggingface-cli download --repo-type dataset XavierJiezou/ktda-datasets --local-dir data --include grass.zip huggingface-cli download --repo-type dataset XavierJiezou/ktda-datasets --local-dir data --include cloud.zip

    Step 2: Extract datasets

    unzip grass.zip -d grass unzip cloud.zip -d l8_biome… See the full description on the dataset page: https://huggingface.co/datasets/XavierJiezou/ktda-datasets.

  13. h

    rtl-repo-curated

    • huggingface.co
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonya Jin (2024). rtl-repo-curated [Dataset]. https://huggingface.co/datasets/sonyashijin/rtl-repo-curated
    Explore at:
    Dataset updated
    May 30, 2024
    Authors
    Sonya Jin
    Description

    sonyashijin/rtl-repo-curated dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    test-repo

    • huggingface.co
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tyler Ross (2024). test-repo [Dataset]. https://huggingface.co/datasets/tdross/test-repo
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2024
    Authors
    Tyler Ross
    Description

    tdross/test-repo dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. h

    test-repo

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Nguyen, test-repo [Dataset]. https://huggingface.co/datasets/Adanato/test-repo
    Explore at:
    Authors
    Adam Nguyen
    Description

    Adanato/test-repo dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. h

    my-target-repo

    • huggingface.co
    Updated Dec 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fred Elias (2025). my-target-repo [Dataset]. https://huggingface.co/datasets/samfred2/my-target-repo
    Explore at:
    Dataset updated
    Dec 1, 2025
    Authors
    Fred Elias
    Description

    samfred2/my-target-repo dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    open_data_boamps

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BoAmps (2025). open_data_boamps [Dataset]. https://huggingface.co/datasets/boavizta/open_data_boamps
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    BoAmps
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Guide: How to share your data on the BoAmps repository

    This guide explains step by step how to share BoAmps format reports on this public Hugging Face repository.

      Prerequisites
    

    Before starting, make sure you have:

    A Hugging Face account The files you want to upload

      Method 1: Hugging Face Web Interface
    

    Log in to Hugging Face

    Go to the boamps dataset

    Navigate to the files: Click on "Files and versions" then on the "data" folder

    Click on "Contribute" then… See the full description on the dataset page: https://huggingface.co/datasets/boavizta/open_data_boamps.

  18. h

    repo

    • huggingface.co
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Stone (2025). repo [Dataset]. https://huggingface.co/datasets/tstone87/repo
    Explore at:
    Dataset updated
    Feb 2, 2025
    Authors
    Thomas Stone
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    tstone87/repo dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. h

    HISTAI-metadata

    • huggingface.co
    Updated Jun 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HistAI (2025). HISTAI-metadata [Dataset]. https://huggingface.co/datasets/histai/HISTAI-metadata
    Explore at:
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    HistAI
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dear researchers and engineers, you're accessing a dataset that would cost millions of dollars to build and took millions of nerves to negotiate favorable terms for its use. Your support, by liking the repositories and upvoting the collection, costs nothing but gives us valuable motivation to continue our contributions to the community. We reserve the right not to approve the request if you don't support our efforts. Thank you very much for collaboration!

      HISTAI Dataset
    

    HISTAI is a… See the full description on the dataset page: https://huggingface.co/datasets/histai/HISTAI-metadata.

  20. h

    gxucpc-repo-v1

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CHengtao Sun, gxucpc-repo-v1 [Dataset]. https://huggingface.co/datasets/cityTS/gxucpc-repo-v1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    CHengtao Sun
    Description

    cityTS/gxucpc-repo-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sylvain Lesage (2024). trending-repos [Dataset]. https://huggingface.co/datasets/severo/trending-repos

trending-repos

severo/trending-repos

Trending repositories on Hugging Face

Explore at:
107 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 1, 2024
Authors
Sylvain Lesage
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for Dataset Name

  Dataset Summary

This dataset contains the 20 trending repositories of each type: models, datasets, and space, on Hugging Face, every day. Each type can be loaded from its own dataset config.

  Supported Tasks and Leaderboards

[More Information Needed]

  Languages

Not relevant.

  Dataset Structure





  Data Instances

The dataset contains three configurations: models: the history of trending models on Hugging… See the full description on the dataset page: https://huggingface.co/datasets/severo/trending-repos.

Search
Clear search
Close search
Google apps
Main menu