99 datasets found

h
IQA-PyTorch-Datasets
huggingface.co
Updated Feb 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaofeng Chen (2024). IQA-PyTorch-Datasets [Dataset]. https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets
Explore at:
Dataset updated
Feb 18, 2024
Authors
Chaofeng Chen
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Description

This is the dataset repository used in the pyiqa toolbox. Please refer to Awesome Image Quality Assessment for details of each dataset Example commandline script with huggingface-cli: huggingface-cli download chaofengc/IQA-PyTorch-Datasets live.tgz --local-dir ./datasets --repo-type dataset cd datasets tar -xzvf live.tgz

Disclaimer for This Dataset Collection

This collection of datasets is compiled and maintained for academic, research, and educational… See the full description on the dataset page: https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets.
Accelerate by HuggingFace (for offline usage)
kaggle.com
Updated Apr 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreyansh Singh (2021). Accelerate by HuggingFace (for offline usage) [Dataset]. https://www.kaggle.com/shreyansh2626/accelerate-by-huggingface-for-offline-usage/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shreyansh Singh
Description
Context

Accelerate is a Python library that allows running raw PyTorch training scripts on any kind of device very easily. It allows easy integration into your code. More details are here - https://huggingface.co/blog/accelerate-library

Pretrained BERT Models for PyTorch

kaggle.com

Updated May 28, 2019

Facebook

Twitter

Click to copy link

Link copied

Cite

[he.ai]soulmachine (2019). Pretrained BERT Models for PyTorch [Dataset]. https://www.kaggle.com/datasets/soulmachine/pretrained-bert-models-for-pytorch

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 28, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

[he.ai]soulmachine

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Pretrained BERT models for pytorch-pretrained-bert

Details: Files' URLs are found from its source code.

https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L39 :

PRETRAINED_MODEL_ARCHIVE_MAP = {
  'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz",
  'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz",
  'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz",
  'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz",
  'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz",
  'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz",
  'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz",
}

https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/tokenization.py#L29 :

PRETRAINED_VOCAB_ARCHIVE_MAP = {
  'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
  'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
  'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
  'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
  'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt",
  'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt",
  'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt",
}

E
Data from: PyTorch model for Slovenian Named Entity Recognition SloNER 1.0
live.european-language-grid.eu
Updated Jan 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). PyTorch model for Slovenian Named Entity Recognition SloNER 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/20980
Explore at:
Dataset updated
Jan 26, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).

The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.
pretrained transformers
kaggle.com
Updated Jul 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikita Kozodoi (2021). pretrained transformers [Dataset]. https://www.kaggle.com/datasets/kozodoi/transformers/versions/13
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 23, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nikita Kozodoi
Description
To import pretrained transformer weights, simply specify the folder path in the corresponding function: model_path = '../input/transformers/roberta-base/' model = AutoModel.from_pretrained(model_path)

The dataset includes the following weights, configs and tokenizers: - bert-base-uncased - bert-large-uncased - distilroberta-base - distilbert-base-uncased - funnel-transformer-small - funnel-transformer-large - roberta-base - roberta-large - t5-base - t5-large - xlnet-base-cased - xlnet-large-cased - albert-large-v2

Further information: - All weights are downloaded from Huggingface Model Hub - Source: https://huggingface.co/models - License: Apache License 2.0
PyTorch 1.12.1 + CUDA 11.6 + HuggingFace
kaggle.com
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johannes (2023). PyTorch 1.12.1 + CUDA 11.6 + HuggingFace [Dataset]. https://www.kaggle.com/datasets/ecoue123/pytorchhuggingface-wheels-cuda-116
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Johannes
Description
!python -m pip install --upgrade /kaggle/input/pytorchhuggingface-wheels-cuda-116/*.whl
pytorch-image-models-dependents
huggingface.co
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face OSS Metrics (2023). pytorch-image-models-dependents [Dataset]. https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 16, 2023
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face OSS Metrics
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
pytorch-image-models metrics

This dataset contains metrics about the huggingface/pytorch-image-models package. Number of repositories in the dataset: 3615 Number of packages in the dataset: 89

Package dependents

This contains the data available in the used-by tab on GitHub.

Package & Repository star count

This section shows the package and repository star count, individually.

Package Repository

There are 18 packages that have more than 1000… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents.
h
pytorch
huggingface.co
Updated Jul 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siqi Guo (2025). pytorch [Dataset]. https://huggingface.co/datasets/siqi00/pytorch
Explore at:
Dataset updated
Jul 17, 2025
Authors
Siqi Guo
Description
siqi00/pytorch dataset hosted on Hugging Face and contributed by the HF Datasets community
h
dped-pytorch
huggingface.co
Updated Jun 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan (2025). dped-pytorch [Dataset]. https://huggingface.co/datasets/i44p/dped-pytorch
Explore at:
Dataset updated
Jun 5, 2025
Authors
Ivan
Description
i44p/dped-pytorch dataset hosted on Hugging Face and contributed by the HF Datasets community
h
KernelBook
huggingface.co
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GPU MODE (2025). KernelBook [Dataset]. https://huggingface.co/datasets/GPUMODE/KernelBook
Explore at:
Dataset updated
Mar 25, 2025
Dataset authored and provided by
GPU MODE
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview

dataset_permissive{.json/.parquet} is a curated collection of pairs of pytorch programs and equivalent triton code (generated by torch inductor) which can be used to train models to translate pytorch code to triton code. The triton code was generated using PyTorch 2.5.0 so for best results during evaluation / running the triton code we recommend using that version of pytorch.

Dataset Creation

The dataset was created through the following process:

Repository… See the full description on the dataset page: https://huggingface.co/datasets/GPUMODE/KernelBook.
h
pytorch-repo-code
huggingface.co
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kye Gomez (2023). pytorch-repo-code [Dataset]. https://huggingface.co/datasets/kye/pytorch-repo-code
Explore at:
Dataset updated
Oct 20, 2023
Authors
Kye Gomez
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
kye/pytorch-repo-code dataset hosted on Hugging Face and contributed by the HF Datasets community
E
Data from: Pretrained models for recognising sex education concepts SemSEX...
live.european-language-grid.eu
Updated Nov 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Pretrained models for recognising sex education concepts SemSEX 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/23041
Explore at:
Dataset updated
Nov 10, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).

The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py.

The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895.
Pretrained RoBERTa weights for PyTorch
kaggle.com
Updated Jun 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu-ray Li (2020). Pretrained RoBERTa weights for PyTorch [Dataset]. https://www.kaggle.com/radream/pretrained-roberta-pytorch/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 5, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yu-ray Li
Description
Pretrained RoBERTa weights, taken from https://github.com/huggingface/transformers
h
pytorch-Qwen-7B
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H, pytorch-Qwen-7B [Dataset]. https://huggingface.co/datasets/Crayon2023/pytorch-Qwen-7B
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
H
Description
Crayon2023/pytorch-Qwen-7B dataset hosted on Hugging Face and contributed by the HF Datasets community
E
RobeCzech Base
live.european-language-grid.eu
Updated May 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). RobeCzech Base [Dataset]. https://live.european-language-grid.eu/catalogue/ld/18246
Explore at:
Dataset updated
May 24, 2021
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.
ESM2-huggingface-model
kaggle.com
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BIT_Guber (2023). ESM2-huggingface-model [Dataset]. https://www.kaggle.com/bitguber/esm2-huggingface-model/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
BIT_Guber
Description
ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest.

Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train:
T5_base_pytorch
kaggle.com
Updated Apr 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maitreya Patel (2020). T5_base_pytorch [Dataset]. https://www.kaggle.com/maitreyajp/t5basepytorch/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 19, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maitreya Patel
Description
Context

This dataset provides model, config and spiece files of T5-base for Pytorch. This can be used for loading pre-trained model and modified sentence-piece tokenizer.

Content

config.json - model configuration pytorch_model.bin - pre-trained model spiece.model - vocabulary

Here, spiece.model file can be used for separate tokenizer. For example, in https://www.kaggle.com/c/tweet-sentiment-extraction competition if one requires to get offsets then s/he will not able able to use huggingface inbuilt tokenizer directly. Hence, one can use it as described in https://www.kaggle.com/abhishek/sentencepiece-tokenizer-with-offsets.

Acknowledgements

All files are taken from huggingface or generated using it. Also, @abhishek thank you so much for sharing such a useful information.
h
public
huggingface.co
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pytorch connectomics (2024). public [Dataset]. https://huggingface.co/datasets/pytc/public
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2024
Dataset authored and provided by
pytorch connectomics
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
pytc/public dataset hosted on Hugging Face and contributed by the HF Datasets community
h
pytorch-forum-topics-complete-v2
huggingface.co
Updated Jul 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amit Prakash (2025). pytorch-forum-topics-complete-v2 [Dataset]. https://huggingface.co/datasets/AmitPrakash/pytorch-forum-topics-complete-v2
Explore at:
Dataset updated
Jul 24, 2025
Authors
Amit Prakash
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
PyTorch Forum Topics Dataset

This dataset contains topic metadata scraped from the PyTorch Community Forum. It includes comprehensive information about forum topics that can be used for various NLP tasks related to PyTorch and deep learning discussions.

Dataset Structure

Each record in the dataset contains the following fields:

id: Unique topic identifier title: Topic title slug: URL-friendly version of the title posts_count: Number of posts in the topic reply_count:… See the full description on the dataset page: https://huggingface.co/datasets/AmitPrakash/pytorch-forum-topics-complete-v2.
e
Example (synthetic) images - Dataset - B2FIND
b2find.eudat.eu
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Example (synthetic) images - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/ee28704f-2926-54b3-bf93-751d2546dc68
Explore at:
Dataset updated
Apr 30, 2024
Description
ModelA Hugging Face Unconditional image generation Diffusion Model was used for training. [1] Unconditional image generation models are not conditioned on text or images during training. They only generate images that resemble the training data distribution. The model usually starts with a seed that generates a random noise vector. The model will then use this vector to create an output image similar to the images used to train the model. The training script initializes a UNet2DModel and uses it to train the model. [2] The training loop adds noise to the images, predicts the noise residual, calculates the loss, saves checkpoints at specified steps, and saves the generated models.Training DatasetThe RANZCR CLiP dataset was used to train the model. [3] This dataset has been created by The Royal Australian and New Zealand College of Radiologists (RANZCR) which is a not-for-profit professional organisation for clinical radiologists and radiation oncologists. The dataset has been labelled with a set of definitions to ensure consistency with labelling. The normal category includes lines that were appropriately positioned and did not require repositioning. The borderline category includes lines that would ideally require some repositioning but would in most cases still function adequately in their current position. The abnormal category included lines that required immediate repositioning. 30000 images were used during training. All training images were 512x512 in size. Computational Information Training has been conducted using RTX 6000 cards with 24GB of graphics memory. A checkpoint was created after each epoch was saved with 220 checkpoints being generated so far. Each checkpoint takes up 1GB space in memory. Generating each epoch takes around 6 hours. Machine learning libraries such as TensorFlow, PyTorch, or scikit-learn are used to run the training, along with additional libraries for data preprocessing, visualization, or deployment.Referenceshttps://huggingface.co/docs/diffusers/en/training/unconditional_training#unconditional-image-generationhttps://github.com/huggingface/diffusers/blob/096f84b05f9514fae9f185cbec0a4d38fbad9919/examples/unconditional_image_generation/train_unconditional.py#L356https://www.kaggle.com/competitions/ranzcr-clip-catheter-line-classification/data

Facebook

Twitter

Click to copy link

Link copied

Cite

Chaofeng Chen (2024). IQA-PyTorch-Datasets [Dataset]. https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets

IQA-PyTorch-Datasets

chaofengc/IQA-PyTorch-Datasets

Explore at:

Dataset updated

Feb 18, 2024

Authors

Chaofeng Chen

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

This is the dataset repository used in the pyiqa toolbox. Please refer to Awesome Image Quality Assessment for details of each dataset Example commandline script with huggingface-cli: huggingface-cli download chaofengc/IQA-PyTorch-Datasets live.tgz --local-dir ./datasets --repo-type dataset cd datasets tar -xzvf live.tgz

  Disclaimer for This Dataset Collection

This collection of datasets is compiled and maintained for academic, research, and educational… See the full description on the dataset page: https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets.

Clear search

Close search

Google apps

Main menu

IQA-PyTorch-Datasets

Accelerate by HuggingFace (for offline usage)

Context

Pretrained BERT Models for PyTorch

Data from: PyTorch model for Slovenian Named Entity Recognition SloNER 1.0

pretrained transformers

PyTorch 1.12.1 + CUDA 11.6 + HuggingFace

pytorch-image-models-dependents

pytorch

dped-pytorch

KernelBook

pytorch-repo-code

Data from: Pretrained models for recognising sex education concepts SemSEX...

Pretrained RoBERTa weights for PyTorch

pytorch-Qwen-7B

RobeCzech Base

ESM2-huggingface-model

T5_base_pytorch

Context

Content

Acknowledgements

public

pytorch-forum-topics-complete-v2

Example (synthetic) images - Dataset - B2FIND

IQA-PyTorch-Datasets

chaofengc/IQA-PyTorch-Datasets