100+ datasets found
  1. h

    IQA-PyTorch-Datasets

    • huggingface.co
    Updated Feb 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaofeng Chen (2024). IQA-PyTorch-Datasets [Dataset]. https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets
    Explore at:
    Dataset updated
    Feb 18, 2024
    Authors
    Chaofeng Chen
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    This is the dataset repository used in the pyiqa toolbox. Please refer to Awesome Image Quality Assessment for details of each dataset Example commandline script with huggingface-cli: huggingface-cli download chaofengc/IQA-PyTorch-Datasets live.tgz --local-dir ./datasets --repo-type dataset cd datasets tar -xzvf live.tgz

      Disclaimer for This Dataset Collection
    

    This collection of datasets is compiled and maintained for academic, research, and educational… See the full description on the dataset page: https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets.

  2. Accelerate by HuggingFace (for offline usage)

    • kaggle.com
    Updated Apr 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyansh Singh (2021). Accelerate by HuggingFace (for offline usage) [Dataset]. https://www.kaggle.com/shreyansh2626/accelerate-by-huggingface-for-offline-usage/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shreyansh Singh
    Description

    Context

    Accelerate is a Python library that allows running raw PyTorch training scripts on any kind of device very easily. It allows easy integration into your code. More details are here - https://huggingface.co/blog/accelerate-library

  3. pretrained transformers

    • kaggle.com
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikita Kozodoi (2021). pretrained transformers [Dataset]. https://www.kaggle.com/datasets/kozodoi/transformers/versions/13
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikita Kozodoi
    Description

    To import pretrained transformer weights, simply specify the folder path in the corresponding function: model_path = '../input/transformers/roberta-base/' model = AutoModel.from_pretrained(model_path)

    The dataset includes the following weights, configs and tokenizers: - bert-base-uncased - bert-large-uncased - distilroberta-base - distilbert-base-uncased - funnel-transformer-small - funnel-transformer-large - roberta-base - roberta-large - t5-base - t5-large - xlnet-base-cased - xlnet-large-cased - albert-large-v2

    Further information: - All weights are downloaded from Huggingface Model Hub - Source: https://huggingface.co/models - License: Apache License 2.0

  4. E

    Data from: PyTorch model for Slovenian Named Entity Recognition SloNER 1.0

    • live.european-language-grid.eu
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). PyTorch model for Slovenian Named Entity Recognition SloNER 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/20980
    Explore at:
    Dataset updated
    Jan 26, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).

    The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.

  5. pytorch-image-models-dependents

    • huggingface.co
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face OSS Metrics (2023). pytorch-image-models-dependents [Dataset]. https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face OSS Metrics
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    pytorch-image-models metrics

    This dataset contains metrics about the huggingface/pytorch-image-models package. Number of repositories in the dataset: 3615 Number of packages in the dataset: 89

      Package dependents
    

    This contains the data available in the used-by tab on GitHub.

      Package & Repository star count
    

    This section shows the package and repository star count, individually.

    Package Repository

    There are 18 packages that have more than 1000… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/pytorch-image-models-dependents.

  6. Pretrained BERT Models for PyTorch

    • kaggle.com
    Updated May 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    [he.ai]soulmachine (2019). Pretrained BERT Models for PyTorch [Dataset]. https://www.kaggle.com/datasets/soulmachine/pretrained-bert-models-for-pytorch
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    [he.ai]soulmachine
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Pretrained BERT models for pytorch-pretrained-bert

    Details: Files' URLs are found from its source code.

    https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/modeling.py#L39 :

    PRETRAINED_MODEL_ARCHIVE_MAP = {
      'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz",
      'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz",
      'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz",
      'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz",
      'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz",
      'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz",
      'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz",
    }
    

    https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/pytorch_pretrained_bert/tokenization.py#L29 :

    PRETRAINED_VOCAB_ARCHIVE_MAP = {
      'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
      'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
      'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
      'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
      'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt",
      'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt",
      'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt",
    }
    
  7. PyTorch 1.12.1 + CUDA 11.6 + HuggingFace

    • kaggle.com
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes (2023). PyTorch 1.12.1 + CUDA 11.6 + HuggingFace [Dataset]. https://www.kaggle.com/datasets/ecoue123/pytorchhuggingface-wheels-cuda-116
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Johannes
    Description

    !python -m pip install --upgrade /kaggle/input/pytorchhuggingface-wheels-cuda-116/*.whl

  8. h

    pytorch

    • huggingface.co
    Updated Jul 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siqi Guo (2025). pytorch [Dataset]. https://huggingface.co/datasets/siqi00/pytorch
    Explore at:
    Dataset updated
    Jul 17, 2025
    Authors
    Siqi Guo
    Description

    siqi00/pytorch dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. h

    dped-pytorch

    • huggingface.co
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan (2025). dped-pytorch [Dataset]. https://huggingface.co/datasets/i44p/dped-pytorch
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Ivan
    Description

    i44p/dped-pytorch dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. E

    Data from: Pretrained models for recognising sex education concepts SemSEX...

    • live.european-language-grid.eu
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Pretrained models for recognising sex education concepts SemSEX 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/23041
    Explore at:
    Dataset updated
    Nov 10, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pretrained language models for detecting and classifying the presence of sex education concepts in Slovene curriculum documents. The models are PyTorch neural network models, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers).

    The models are based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397) and on the CroSloEngual BERT model (http://hdl.handle.net/11356/1330). The source code of the model and example usage is available in GitHub repository https://github.com/TimotejK/SemSex. The models and tokenizers can be loaded using the AutoModelForSequenceClassification.from_pretrained() and the AutoTokenizer.from_pretrained() functions from the transformers library. An example of such usage is available at https://github.com/TimotejK/SemSex/blob/main/Concept%20detection/Classifiers/full_pipeline.py.

    The corpus on which these models have been trained is available at http://hdl.handle.net/11356/1895.

  11. E

    RobeCzech Base

    • live.european-language-grid.eu
    Updated May 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). RobeCzech Base [Dataset]. https://live.european-language-grid.eu/catalogue/ld/18246
    Explore at:
    Dataset updated
    May 24, 2021
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    RobeCzech is a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-theart results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base, both for PyTorch and TensorFlow.

  12. Pretrained RoBERTa weights for PyTorch

    • kaggle.com
    Updated Jun 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-ray Li (2020). Pretrained RoBERTa weights for PyTorch [Dataset]. https://www.kaggle.com/radream/pretrained-roberta-pytorch/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 5, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yu-ray Li
    Description

    Pretrained RoBERTa weights, taken from https://github.com/huggingface/transformers

  13. h

    pytorch-repo-code

    • huggingface.co
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kye Gomez (2023). pytorch-repo-code [Dataset]. https://huggingface.co/datasets/kye/pytorch-repo-code
    Explore at:
    Dataset updated
    Oct 20, 2023
    Authors
    Kye Gomez
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    kye/pytorch-repo-code dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    KernelBook

    • huggingface.co
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GPU MODE (2025). KernelBook [Dataset]. https://huggingface.co/datasets/GPUMODE/KernelBook
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset authored and provided by
    GPU MODE
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    dataset_permissive{.json/.parquet} is a curated collection of pairs of pytorch programs and equivalent triton code (generated by torch inductor) which can be used to train models to translate pytorch code to triton code. The triton code was generated using PyTorch 2.5.0 so for best results during evaluation / running the triton code we recommend using that version of pytorch.

      Dataset Creation
    

    The dataset was created through the following process:

    Repository… See the full description on the dataset page: https://huggingface.co/datasets/GPUMODE/KernelBook.

  15. h

    pytorch-Qwen-7B

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H, pytorch-Qwen-7B [Dataset]. https://huggingface.co/datasets/Crayon2023/pytorch-Qwen-7B
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    H
    Description

    Crayon2023/pytorch-Qwen-7B dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. e

    LitLat BERT - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset updated
    Jul 22, 2024
    Description

    Trilingual BERT-like (Bidirectional Encoder Representations from Transformers) model, trained on Lithuanian, Latvian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by fine-tuning the model end-to-end. LitLat BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library). The corpora used for training the model have 4.07 billion tokens in total, of which 2.32 billion are English, 1.21 billion are Lithuanian and 0.53 billion are Latvian. LitLat BERT is based on XLM-RoBERTa model and comes in two versions, one for usage with transformers library (https://github.com/huggingface/transformers), and one for usage with fairseq library (https://github.com/pytorch/fairseq). More information is in the readme.txt.

  17. bert large whole word masking

    • kaggle.com
    Updated Jun 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bilal2vec (2019). bert large whole word masking [Dataset]. https://www.kaggle.com/bkkaggle/bert-large-whole-word-masking/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2019
    Dataset provided by
    Kaggle
    Authors
    bilal2vec
    Description

    Context

    Google released a new iteration of bert (https://github.com/google-research/bert)

    Content

    Pretrained cased and uncased models compatible with https://github.com/huggingface/pytorch-pretrained-BERT

    converted using: https://www.kaggle.com/bkkaggle/converting-whole-word-masking-bert-to-pytorch

    Acknowledgements

    Pretrained models from: https://github.com/google-research/bert
    Converted with https://github.com/huggingface/pytorch-pretrained-BERT

  18. h

    pytorch-forum-topics-complete-v2

    • huggingface.co
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amit Prakash (2025). pytorch-forum-topics-complete-v2 [Dataset]. https://huggingface.co/datasets/AmitPrakash/pytorch-forum-topics-complete-v2
    Explore at:
    Dataset updated
    Jul 24, 2025
    Authors
    Amit Prakash
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    PyTorch Forum Topics Dataset

    This dataset contains topic metadata scraped from the PyTorch Community Forum. It includes comprehensive information about forum topics that can be used for various NLP tasks related to PyTorch and deep learning discussions.

      Dataset Structure
    

    Each record in the dataset contains the following fields:

    id: Unique topic identifier title: Topic title slug: URL-friendly version of the title posts_count: Number of posts in the topic reply_count:… See the full description on the dataset page: https://huggingface.co/datasets/AmitPrakash/pytorch-forum-topics-complete-v2.

  19. ESM2-huggingface-model

    • kaggle.com
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BIT_Guber (2023). ESM2-huggingface-model [Dataset]. https://www.kaggle.com/bitguber/esm2-huggingface-model/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BIT_Guber
    Description

    ESM-2 is a state-of-the-art protein model trained on a masked language modelling objective. It is suitable for fine-tuning on a wide range of tasks that take protein sequences as input. For detailed information on the model architecture and training data, please refer to the accompanying paper. You may also be interested in some demo notebooks (PyTorch, TensorFlow) which demonstrate how to fine-tune ESM-2 models on your tasks of interest.

    Several ESM-2 checkpoints are available in the Hub with varying sizes. Larger sizes generally have somewhat better accuracy, but require much more memory and time to train:

  20. z

    Complete code and datasets for "ESNLIR: A Spanish Multi-Genre Dataset with...

    • zenodo.org
    bin, pdf, zip
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johan David Rodriguez Portela; Johan David Rodriguez Portela; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán (2025). Complete code and datasets for "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships" [Dataset]. http://doi.org/10.5281/zenodo.15002575
    Explore at:
    bin, zip, pdfAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    Arxiv
    Authors
    Johan David Rodriguez Portela; Johan David Rodriguez Portela; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán; Rubén Francisco Manrique Piramanrique; Nicolás Perez Terán
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships

    This is the complete code, model and datasets for the paper ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships.

    Installation

    This repository is a poetry project, which means that it can be installed easily by executing the following command from a shell in the repository folder:

    poetry install

    As this repository is script based, the README.md file contains all the commands executed to generate the dataset and train models.

    ----------------------------------------------------------------------------------------------

    Core code

    The core code used for all the experiments is in the folder auto-nli and all the calls to the core code with the parameters requested are found in README.md

    ----------------------------------------------------------------------------------------------

    Parameters

    All the parameters to create datasets and train models with the core code are found in the folder parameters.

    ----------------------------------------------------------------------------------------------

    Models

    Model types

    For BERT based models, all in pytorch, there are two types of models from huggingfaces that were used for training and also are required to load a dataset because of the tokenizer:

    Model folder

    The model folder contains all the trained models for the paper. There are three types of models:

    • baseline: An XGBoost model that can be loaded with pickle.
    • roberta: BERTIN based models in pytorch. You can load them with the model_path
    • xlmroberta: XLMRoBERTa based models in pytorch. You can load them with the model_path

    Models with the suffix _annot are models trained with the premise (first sentence) only. Apart from the pytorch model folder, each model result folder (ex: ) contains the test results for the test set and the stress test sets (ex: )

    Load model

    Models are found in the folder model and all of them are pytorch models which can be loaded with the huggingface interface:

    from transformers import AutoModel
    
    model = AutoModel.from_pretrained('

    ----------------------------------------------------------------------------------------------

    Dataset

    labeled_final_dataset.jsonl

    This file is included outside the ZIP containing all other files, and it contains the final test dataset with 974 examples selected by human majority label matching the original linking phrase label.

    Other datasets:

    The datasets can be found in the folder data that is divided in the following folders:

    base_dataset

    The splits to train, validate and test the models.

    splits_data

    Splits of train-val-test extracted for each corpora. They are used to generate base_dataset.

    sentence_data

    Pairs of sentences found in each corpus. They are used to generate splits_data.

    Dataset dictionary

    This repository contains the splits that resulted from the research project "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships". All the splits are in JSONL format and have the same fields per example:

    • sentence_1: First sentence of the pair.
    • sentence_2: Second sentence of the pair.
    • connector: Linking phrase used to extract pair.
    • connector_type: NLI label, between "contrasting", "entailment", "reasoning" or "neutral"
    • extraction_strategy: "linking_phrase" for "contrasting", "entailment", "reasoning" and "none" for neutral.
    • distance: How many sentences before the connector is the sentence_1
    • sentence_1_position: Number of sentence for sentence_1 in the source document
    • sentence_1_paragraph: Number of paragraph for sentence_1 in the source document
    • sentence_2_position: Number of sentence for sentence_2 in the source document
    • sentence_2_paragraph: Number of paragraph for sentence_2 in the source document
    • id: Unique identifier for the example
    • dataset: Source corpus of the pair. Metadata of corpus, including source can be found in dataset_metadata.xlsx.
    • genre: Writing genre of the dataset.
    • domain: Domain genre of the dataset.

    Example:

    {"sentence_1":"sefior Bcajavides no es moderado, tampoco lo convertirse e\u00f1 declarada divergencia de miras polileido en griego","sentence_2":"era mayor claricomentarios, as\u00ed de los peri\u00f3dicos como de los homes dado \u00e1 la voluntad de los hombres, sin que sobreticas","connector":"por consiguiente,","connector_type":"reasoning","extraction_strategy":"linking_phrase","distance":1.0,"sentence_1_paragraph":4,"sentence_1_position":86,"sentence_2_paragraph":4,"sentence_2_position":87,"id":"esnews_spanish_pd_news_531537","dataset":"esnews_spanish_pd_news","genre":"news","domain":"spanish_public_domain_news"}

    Dataset load

    To load a dataset/split as a pytorch object used to train-validate-test models you must use the custom class dataset

    from auto_nli.model.bert_based.dataset import BERTDataset
    dataset = BERTDataset(
    os.path.join(dataset_folder,
    max_len=
    model_type=

    only_premise=
    max_samples=

    ----------------------------------------------------------------------------------------------

    Notebooks

    The folder notebooks contains a collection of jupyter notebooks used to preprocess datasets and visualize results.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chaofeng Chen (2024). IQA-PyTorch-Datasets [Dataset]. https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets

IQA-PyTorch-Datasets

chaofengc/IQA-PyTorch-Datasets

Explore at:
Dataset updated
Feb 18, 2024
Authors
Chaofeng Chen
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Description

This is the dataset repository used in the pyiqa toolbox. Please refer to Awesome Image Quality Assessment for details of each dataset Example commandline script with huggingface-cli: huggingface-cli download chaofengc/IQA-PyTorch-Datasets live.tgz --local-dir ./datasets --repo-type dataset cd datasets tar -xzvf live.tgz

  Disclaimer for This Dataset Collection

This collection of datasets is compiled and maintained for academic, research, and educational… See the full description on the dataset page: https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets.

Search
Clear search
Close search
Google apps
Main menu