Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets.
This dataset contains the data of 16k models available on huggingface.co. This dataset contains the following features of the model; 1. model url 2. model title 3. downloads and likes 4. updated
Facebook
Twitterkaggle-map/data dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains language models released by Microsoft and fetched from the huggingface ๐ช
You will find the standard models and also some models that I have found to be useful in different NLP competitions. Feel free to run one model first the standard and running it on a derived model to create an ensemble.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F820904%2F42675d03c4810c2d3f0a6fdb07606166%2Fdownload.png?generation=1669773338137238&alt=media" alt="">
Includes standard models: - deberta-v3-base - deberta-v3-large - deberta-v3-small - deberta-v3-xsmall - mdeberta-v3-base (multi-language)
and a couple of fine-tuned models: - deberta-v3-base-squad2 - khalidalt-deberta-v3-mnli - yevheniimaslov-deberta-v3-base-cola - deberta-v3-large-ontonotes5
Facebook
TwitterThis dataset was created by Abhishek Thakur
Facebook
Twittermerve/kaggle-beans dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterGitHub Issues & Kaggle Notebooks
Description
GitHub Issues & Kaggle Notebooks is a collection of two code datasets intended for language models training, they are sourced from GitHub issues and notebooks in Kaggle platform. These datasets are a modified part of the StarCoder2 model training corpus, precisely the bigcode/StarCoder2-Extras dataset. We reformat the samples to remove StarCoder2's special tokens and use natural text to delimit comments in issues and displayโฆ See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceTB/issues-kaggle-notebooks.
Facebook
TwitterHugging Face Optimum ๐ค Optimum is an extension of ๐ค Transformers, providing a set of performance optimization tools enabling maximum efficiency to train and run models on targeted hardware.
The AI ecosystem evolves quickly and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables users to efficiently use any of these platforms with the same ease inherent to transformers.
Photo by Anton Filatov on Unsplash
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Hugging Face, Inc. is a French-American company and open-source community that develops tools and resources to build, deploy, and train machine learning models.
I've always been intrigued on how many models do they have, and how much these are utilized which can be somewhat measured in the form of model downloads and likes.
This dataset contains scraped information from their models page
Header image generated using Microsoft Bing Image Creator with this prompt
Huggingface logo in cute ghost costume doing trick or treat holding a pumpkin of machine learning models while Yan Lecunn and Andrew Ng are looking
Facebook
TwitterTomate/Kaggle-Titanic dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterYunij/kaggle-comp-daigt dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://huggingface.co/landing/assets/transformers-docs/huggingface_logo.svg" alt="HuggingFace">
Dataset containing metadata information of all the publicly uploaded models(10,000+) available on HuggingFace model hub Data was collected between 15-20th June 2021.
Dataset was generated using huggingface_hub APIs provided by huggingface team.
This is my first dataset upload on Kaggle. I hope you like it. :)
Facebook
TwitterThis dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.
By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".
For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.
Usage
To use this dataset, attach it to your notebook and specify the path to the dataset. For example:
from transformers import AutoTokenizer, AutoModelForPreTraining
โ
MODEL_DIR = "/kaggle/input/huggingface-roberta/"
โ
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base")
model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")
Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Nestor Mamani
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterThis is the latest version of Hugging Face datasets to be used in offline notebooks on Kaggle. It is automatically updated every week.
!pip install datasets --no-index --find-links=file:///kaggle/input/hf-ds -U -q
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Source repo is google/flan-t5-large.
from transformers import AutoTokenizer, AutoModel
model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
Facebook
TwitterGholamreza/test-dataset-kaggle dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset was created by Oleksandr Sirenko
It contains the following files:
Facebook
TwitterThis dataset contains different variants of the MobileBERT model by Google available on Hugging Face's model repository.
By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".
For more information on usage visit the mobilebert hugging face docs.
Usage
To use this dataset, attach it to your notebook and specify the path to the dataset. For example:
from transformers import AutoTokenizer, AutoModelForPreTraining
โ
MODEL_DIR = "/kaggle/input/huggingface-google-mobilebert/"
โ
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = AutoModelForPreTraining.from_pretrained(MODEL_DIR)
Acknowledgements All the copyrights and IP relating to MobileBERT belong to the original authors (Sun et al.) and Google. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
Facebook
TwitterThis is a labeled corpus dataset of article text with corresponding political bias obtained from Huggingface. It contains 17,362 articles labeled left, right, or center by the editors of allsides.com. Articles were manually annotated by news editors who were attempting to select representative articles from the left, right and center of each article topic.
Facebook
TwitterThis dataset was created by miwojc
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Hugging Face, Inc. is an American company that develops tools for building applications using machine learning. It is most notable for its Transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets.
This dataset contains the data of 16k models available on huggingface.co. This dataset contains the following features of the model; 1. model url 2. model title 3. downloads and likes 4. updated