12 datasets found
  1. torch_transforms

    • kaggle.com
    zip
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uttam Mittal (2023). torch_transforms [Dataset]. https://www.kaggle.com/datasets/uttammittal02/torch-transforms
    Explore at:
    zip(47128 bytes)Available download formats
    Dataset updated
    Feb 21, 2023
    Authors
    Uttam Mittal
    Description

    Dataset

    This dataset was created by Uttam Mittal

    Contents

  2. Z

    Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maura Pintor; Daniele Angioni; Angelo Sotgiu; Luca Demetrio; Ambra Demontis; Battista Biggio; Fabio Roli (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    University of Genoa, Italy
    University of Cagliari, Italy
    Authors
    Maura Pintor; Daniele Angioni; Angelo Sotgiu; Luca Demetrio; Ambra Demontis; Battista Biggio; Fabio Roli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

    We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

    An example showing how to use the dataset is shown below.

    code for testing robustness of a model

    import os.path

    from torchvision import datasets, transforms, models import torch.utils.data

    class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

    def find_classes(self, directory):
      classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
      if not classes:
        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
      class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
              len(os.listdir(os.path.join(directory, cls_name))) > 0}
      return classes, class_to_idx
    

    extract and unzip the dataset, then write top folder here

    dataset_folder = 'data/ImageNet-Patch'

    available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

    select folder with specific target

    target_label = 954

    dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

    dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

    batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

    accuracy = correct / total attack_sr = attack_success / total

    print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)

  3. h

    imagenet1k_dcae-f64-latents

    • huggingface.co
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sway (2025). imagenet1k_dcae-f64-latents [Dataset]. https://huggingface.co/datasets/SwayStar123/imagenet1k_dcae-f64-latents
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2025
    Authors
    Sway
    Description

    Example usage. You will have to use a shape batching dataset when training in batches from datasets import load_dataset from diffusers import AutoencoderDC import torch import torchvision.transforms as transforms from PIL import Image

    ds = load_dataset("SwayStar123/imagenet1k_dcae-f64-latents_train")

    with torch.inference_mode(): device = "cuda" ae = AutoencoderDC.from_pretrained("mit-han-lab/dc-ae-f64c128-mix-1.0-diffusers", cache_dir="ae", torch_dtype=torch.bfloat16).to(device).eval()… See the full description on the dataset page: https://huggingface.co/datasets/SwayStar123/imagenet1k_dcae-f64-latents.

  4. heptapod_dataset

    • kaggle.com
    zip
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matheus Latorre Cavini (2025). heptapod_dataset [Dataset]. https://www.kaggle.com/datasets/matheuslatorrecavini/heptapod-dataset/discussion
    Explore at:
    zip(10262538 bytes)Available download formats
    Dataset updated
    Jun 28, 2025
    Authors
    Matheus Latorre Cavini
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset consists of 4900 images of logograms from Heptapod B language, in resolution 224x224, and the captions for their meaning in English. There are 49 unique logograms and 100 variations (rotation, scaling, translation) for each of them.

    Original source of the data: Wolfram Research GitHub Repository. Distributed under Creative Commons Attribution-NonCommercial 4.0 International License.

    The dataset was augmented by merging morphems of the logograms and by applying geometric transformations to create variations of each image.

    The captions.txt file provide captions for each unique logogram, and can interpreted as:

    • 000.png | Abbot is dead is the caption for images 0000.png to 0099.png
    • 001.png | Abbot is the caption for images 0100.png to 0199.png
    • 002.png | Abbot chooses save humanityis the caption for images 0200.png to 0299.png
    • And so on

    Suggested loading for PyTorch:

    from PIL import Image
    import torch
    from torch.utils.data import Dataset, DataLoader
    from torchvision import transforms
    import os
    
    class TextToImageDataset(Dataset):
      def _init_(self, image_dir, captions_file, transform=None):
        self.image_dir = image_dir # Path for the images on the dataset
        self.transform = transform
        self.pairs = [] # Array to store (image, sentence) pairs
    
        with open(captions_file, "r") as f:
          for line in f:
            idx, caption = line.strip().split("|")
            idx = idx.strip().split(".")[0]
            caption = caption.strip()
            for i in range(100):
              img_file = f"{(int(idx)*100 + i):04d}.png" # Get the image number by doing idx*100 + i 
              self.pairs.append((caption, img_file))   # Apply the same caption for every variation of the same logogram
    
      def _len_(self):
        return len(self.pairs)
    
      def _getitem_(self, idx):
        text, img_file = self.pairs[idx]
        image = Image.open(os.path.join(self.image_dir, img_file)).convert("RGB")
        if self.transform:
          image = self.transform(image)
        return text, image #item = (text, image)
    
    transform = transforms.Compose([
      transforms.Resize((224, 224)),
      transforms.ToTensor()
    ])
    
    base_dir = "/kaggle/input/heptapod-dataset/dataset/"
    
    dataset = TextToImageDataset(image_dir=base_dir+"images",captions_file=base_dir+"captions.txt", transform=transform)
    
  5. cifar_10_in_tensor

    • kaggle.com
    zip
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KKaiWWang (2022). cifar_10_in_tensor [Dataset]. https://www.kaggle.com/datasets/kkaiwwang/cifar-10-in-tensor
    Explore at:
    zip(1454680895 bytes)Available download formats
    Dataset updated
    Oct 28, 2022
    Authors
    KKaiWWang
    Description

    CIFAR-10 Dataset with format of Pytorch Tensor.

    You can directly use torch.load('---File_Path---') to load data.

    The whole dataset was seperated into 3 parts: train_X, train_y, test_X. Specifically, train_X contains 50, 000 'images' and test_X contains 300, 000 'images'. To be more detailed, train_X has shape of (50000, 3, 32, 32), train_y has shape of (50000,) and test_X has shape of (300000, 3, 32, 32).

    Tips: If you wanna use data augment, it's unnecessary to transform these tensors to images to do so, actually you can directly apply Torchvision Transforms (or a Compose of Transforms) on tensors, it does work :)

  6. OGBN-Proteins (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-Proteins (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-proteins
    Explore at:
    zip(677947148 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    OGBN-Proteins

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-proteins

    Usage in Python

    import os.path as osp
    import pandas as pd
    import torch
    import torch_geometric.transforms as T
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnProteins(PygNodePropPredDataset):
      def _init_(self, meta_csv = None):
        root, name, transform = '/kaggle/input', 'ogbn-proteins', T.ToSparseTensor()
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbn-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        if osp.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
        if self.is_hetero:
          train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
          for nodetype in train_idx_dict.keys():
            train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
            valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
            test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
            return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
        else:
          train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
          train_idx = torch.from_numpy(train_idx).to(torch.long)
          valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
          valid_idx = torch.from_numpy(valid_idx).to(torch.long)
          test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
          test_idx = torch.from_numpy(test_idx).to(torch.long)
          return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnProteins()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-proteins dataset is an undirected, weighted, and typed (according to species) graph. Nodes represent proteins, and edges indicate different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology [1,2]. All edges come with 8-dimensional features, where each dimension represents the strength of a single association type and takes values between 0 and 1 (the larger the value is, the stronger the association is). The proteins come from 8 species.

    Prediction task: The task is to predict the presence of protein functions in a multi-label binary classification setup, where there are 112 kinds of labels to predict in total. The performance is measured by the average of ROC-AUC scores across the 112 tasks.

    Dataset splitting: The authors split the protein nodes into training/validation/test sets according to the species which the proteins come from. This enables the evaluation of the generalization performance of the model across different species.

    Note: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.1132,53439,561,252SpeciesMulti-label binary classificationROC-AUC

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta-Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019. [2] Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchm...

  7. OGBN-ArXiv (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-ArXiv (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbn-arxiv
    Explore at:
    zip(169289809 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    OGBN-ArXiv

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-arxiv

    Usage in Python

    import os.path as osp
    import pandas as pd
    import datatable as dt
    import torch
    import torch_geometric.transforms as T
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnArxiv(PygNodePropPredDataset):
      def _init_(self):
        root, name, transform = '/kaggle/input', 'ogbn-arxiv', T.ToSparseTensor()
        master = pd.read_csv(osp.join(root, name, 'ogbn-master.csv'), index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self):
        split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        train_idx = dt.fread(osp.join(path, 'train.csv'), header = False).to_numpy().T[0]
        train_idx = torch.from_numpy(train_idx).to(torch.long)
        valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = False).to_numpy().T[0]
        valid_idx = torch.from_numpy(valid_idx).to(torch.long)
        test_idx = dt.fread(osp.join(path, 'test.csv'), header = False).to_numpy().T[0]
        test_idx = torch.from_numpy(test_idx).to(torch.long)
        return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnArxiv()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-arxiv dataset is a directed graph, representing the citation network between all Computer Science (CS) arXiv papers indexed by MAG [1]. Each node is an arXiv paper and each directed edge indicates that one paper cites another one. Each paper comes with a 128-dimensional feature vector obtained by averaging the embeddings of words in its title and abstract. The embeddings of individual words are computed by running the skip-gram model [2] over the MAG corpus. The authors also provide the mapping from MAG paper IDs into the raw texts of titles and abstracts here. In addition, all papers are also associated with the year that the corresponding paper was published.

    Prediction task: The task is to predict the 40 subject areas of arXiv CS papers, e.g., cs.AI, cs.LG, and cs.OS, which are manually determined (i.e., labeled) by the paper’s authors and arXiv moderators. With the volume of scientific publications doubling every 12 years over the past century, it is practically important to automatically classify each publication’s areas and topics. Formally, the task is to predict the primary categories of the arXiv papers, which is formulated as a 40-class classification problem.

    Dataset splitting: The authors consider a realistic data split based on the publication dates of the papers. The general setting is that the ML models are trained on existing papers and then used to predict the subject areas of newly-published papers, which supports the direct application of them into real-world scenarios, such as helping the arXiv moderators. Specifically, the authors propose to train on papers published until 2017, validate on those published in 2018, and test on those published since 2019.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.1169,3431,166,243TimeMulti-class classificationAccuracy

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1):396–413, 2020. [2] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pp. 3111–3119, 2013. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  8. G2Net Q Transform (69x65)

    • kaggle.com
    zip
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robbie Beane (2022). G2Net Q Transform (69x65) [Dataset]. https://www.kaggle.com/datasets/drbeane/g2net-q-transform-69x65
    Explore at:
    zip(38637696329 bytes)Available download formats
    Dataset updated
    Mar 4, 2022
    Authors
    Robbie Beane
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    Q_TRANSFORM = CQT1992v2(sr=2048, fmin=20, fmax=1024, hop_length=64)
    
    def q_trans(x):
      
      for i in range(3):
        waves = x[i] / np.max(x[i])
        waves = torch.from_numpy(waves).float()
        image = Q_TRANSFORM(waves)
        array = np.array(image)
    
        if i == 0:
          X = np.repeat(array, 3, axis=0)
        else:
          X[i,:,:] = array
    
      X = np.swapaxes(X, 0, 1)
      X = np.swapaxes(X, 1, 2)
      return X
    
    x = np.load(path)
    X = transformation(x)
    
  9. torchaudio

    • kaggle.com
    zip
    Updated Aug 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HyeongChan Kim (2020). torchaudio [Dataset]. https://www.kaggle.com/datasets/kozistr/torchaudio/suggestions
    Explore at:
    zip(9903613 bytes)Available download formats
    Dataset updated
    Aug 6, 2020
    Authors
    HyeongChan Kim
    Description

    Context

    torchaudio pip package (for offline kernel)

    Currently (2020/08/06), default torch version is 1.5.0 on the kaggle kernel. So, torchaudio 0.5.0 would work without upgrading torch version.

    torchaudio

  10. COCO2017 Image Caption Train

    • kaggle.com
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seungjun Lee (2024). COCO2017 Image Caption Train [Dataset]. https://www.kaggle.com/datasets/seungjunleeofficial/coco2017-image-caption-train
    Explore at:
    zip(19236355851 bytes)Available download formats
    Dataset updated
    May 30, 2024
    Authors
    Seungjun Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains only the COCO 2017 train images (118K images) and a caption annotation JSON file, designed to fit within Google Colab's available disk space of approximately 50GB when connected to a GPU runtime.

    If you're using PyTorch on Google Colab, you can easily utilize this dataset as follows:

    Manually downloading and uploading the file to Colab can be time-consuming. Therefore, it's more efficient to download this data directly into Google Colab. Please ensure you have first added your Kaggle key to Google Colab. You can find more details on this process here

    from google.colab import drive
    import os
    import torch
    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    
    os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
    os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
    
    # Download the Dataset and unzip it
    !kaggle datasets download -d seungjunleeofficial/coco2017-image-caption-train
    !mkdir "/content/Dataset"
    !unzip "coco2017-image-caption-train" -d "/content/Dataset"
    
    
    # load the dataset
    cap = dset.CocoCaptions(root = '/content/Dataset/COCO2017 Image Captioning Train/train2017',
                annFile = '/content/Dataset/COCO2017 Image Captioning Train/captions_train2017.json',
                transform=transforms.PILToTensor())
    

    You can then use the dataset in the following way:

    print(f"Number of samples: {len(cap)}")
    img, target = cap[3]
    print(img.shape)
    print(target)
    # Output example: torch.Size([3, 425, 640])
    # ['A zebra grazing on lush green grass in a field.', 'Zebra reaching its head down to ground where grass is.', 
    # 'The zebra is eating grass in the sun.', 'A lone zebra grazing in some green grass.', 
    # 'A Zebra grazing on grass in a green open field.']
    
  11. PyTorch 1.0.0 Pretrained Image Models

    • kaggle.com
    zip
    Updated Jan 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Minixhofer (2019). PyTorch 1.0.0 Pretrained Image Models [Dataset]. https://www.kaggle.com/bminixhofer/pytorch-pretrained-image-models
    Explore at:
    zip(283107267 bytes)Available download formats
    Dataset updated
    Jan 18, 2019
    Authors
    Benjamin Minixhofer
    Description

    Usage

    import torch
    from torchvision import models, transforms
    
    # densenet121
    model = models.densenet121()
    model.load_state_dict(torch.load('densenet121.pth'))
    
    # densenet201
    model = models.densenet201()
    model.load_state_dict(torch.load('densenet201.pth'))
    
    # resnet50
    model = models.resnet50()
    model.load_state_dict(torch.load('resnet50.pth'))
    
    # resnet34
    model = models.resnet34()
    model.load_state_dict(torch.load('resnet34.pth'))
    

    License

    From the PyTorch github repo:

    From PyTorch:
    
    Copyright (c) 2016-   Facebook, Inc      (Adam Paszke)
    Copyright (c) 2014-   Facebook, Inc      (Soumith Chintala)
    Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
    Copyright (c) 2012-2014 Deepmind Technologies  (Koray Kavukcuoglu)
    Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
    Copyright (c) 2011-2013 NYU           (Clement Farabet)
    Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
    Copyright (c) 2006   Idiap Research Institute (Samy Bengio)
    Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
    
    From Caffe2:
    
    Copyright (c) 2016-present, Facebook Inc. All rights reserved.
    
    All contributions by Facebook:
    Copyright (c) 2016 Facebook Inc.
    
    All contributions by Google:
    Copyright (c) 2015 Google Inc.
    All rights reserved.
    
    All contributions by Yangqing Jia:
    Copyright (c) 2015 Yangqing Jia
    All rights reserved.
    
    All contributions from Caffe:
    Copyright(c) 2013, 2014, 2015, the respective contributors
    All rights reserved.
    
    All other contributions:
    Copyright(c) 2015, 2016 the respective contributors
    All rights reserved.
    
    Caffe2 uses a copyright model similar to Caffe: each contributor holds
    copyright over their contributions to Caffe2. The project versioning records
    all such contribution and copyright details. If a contributor wants to further
    mark their specific copyright on a particular contribution, they should
    indicate their copyright solely in the commit message of the change when it is
    committed.
    
    All rights reserved.
    
    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are met:
    
    1. Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    
    2. Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    
    3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America
      and IDIAP Research Institute nor the names of its contributors may be
      used to endorse or promote products derived from this software without
      specific prior written permission.
    
    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
    AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
    ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
    LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
    CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
    SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
    INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
    CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
    ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
    POSSIBILITY OF SUCH DAMAGE.
    
  12. OGBG-MolSIDER (Processed for PyG)

    • kaggle.com
    zip
    Updated Feb 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBG-MolSIDER (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbg-molsider
    Explore at:
    zip(493502 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Redao da Taupl
    Description

    OGBN-MolSIDER

    Webpage: https://ogb.stanford.edu/docs/graphprop/#ogbg-mol

    Usage in Python

    import os
    import os.path as osp
    import pandas as pd
    import torch
    from ogb.graphproppred import PygGraphPropPredDataset
    
    class PygOgbgMol(PygGraphPropPredDataset):
      def _init_(self, name, transform = None, pre_transform = None, meta_csv = None):
        root = '../input'
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbg-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, pre_transform = pre_transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
          
        path = osp.join(self.root, 'split', split_type)
    
        # short-cut if split_dict.pt exists
        if os.path.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
    
        train_idx = pd.read_csv(osp.join(path, 'train.csv'), header = None).values.T[0]
        valid_idx = pd.read_csv(osp.join(path, 'valid.csv'), header = None).values.T[0]
        test_idx = pd.read_csv(osp.join(path, 'test.csv'), header = None).values.T[0]
    
        return {'train': torch.tensor(train_idx, dtype = torch.long), 'valid': torch.tensor(valid_idx, dtype = torch.long), 'test': torch.tensor(test_idx, dtype = torch.long)}
    
    dataset = PygOgbgMol('ogbg-molsider')
    
    from torch_geometric.data import DataLoader
    
    batch_size = 32
    split_idx = dataset.get_idx_split()
    train_loader = DataLoader(dataset[split_idx['train']], batch_size = batch_size, shuffle = True)
    valid_loader = DataLoader(dataset[split_idx['valid']], batch_size = batch_size, shuffle = False)
    test_loader = DataLoader(dataset[split_idx['test']], batch_size = batch_size, shuffle = False)
    

    Description

    Graph: The ogbg-molhiv and ogbg-molpcba datasets are two molecular property prediction datasets of different sizes: ogbg-molhiv (small) and ogbg-molpcba (medium). They are adopted from the MoleculeNet [1], and are among the largest of the MoleculeNet datasets. All the molecules are pre-processed using RDKit [2]. Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds. Input node features are 9-dimensional, containing atomic number and chirality, as well as other additional atom features such as formal charge and whether the atom is in the ring or not. The full description of the features is provided in code. The script to convert the SMILES string [3] to the above graph object can be found here. Note that the script requires RDKit to be installed. The script can be used to pre-process external molecule datasets so that those datasets share the same input feature space as the OGB molecule datasets. This is particularly useful for pre-training graph models, which has great potential to significantly increase generalization performance on the (downstream) OGB datasets [4].

    Beside the two main datasets, the dataset authors additionally provide 10 smaller datasets from MoleculeNet. They are ogbg-moltox21, ogbg-molbace, ogbg-molbbbp, ogbg-molclintox, ogbg-molmuv, ogbg-molsider, and ogbg-moltoxcast for (multi-task) binary classification, and ogbg-molesol, ogbg-molfreesolv, and ogbg-mollipo for regression. Evaluators are also provided for these datasets. These datasets can be used to stress-test molecule-specific methods or transfer learning [4].

    For encoding these raw input features, the dataset authors prepare simple modules called AtomEncoder and BondEncoder. They can be used as follows to embed raw atom and bond features to obtain atom_emb and bond_emb.

    from ogb.graphproppred.mol_encoder import AtomEncoder, BondEncoder
    atom_encoder = AtomEncoder(emb_dim = 100)
    bond_encoder = BondEncoder(emb_dim = 100)
    
    atom_emb = atom_encoder(x) # x is the input atom feature
    edge_emb = bond_encoder(edge_attr) # edge_attr is the input edge feature
    

    Prediction task: The task is to predict the target molecular properties as accurately as possible, where the molecular properties are cast as binary labels, e.g, whether a molecule inhibits HIV virus replication or not. Note that some datasets (e.g., ogbg-molpcba) can have multiple tasks, and can contain nan that indicates the corresponding label is not assigned to the molecule. For evaluation metric, the dataset authors closely follow [2]. Specifically, for ogbg-molhiv, the dataset authors use ROC-AUC...

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Uttam Mittal (2023). torch_transforms [Dataset]. https://www.kaggle.com/datasets/uttammittal02/torch-transforms
Organization logo

torch_transforms

Utility script to transform torch tensors

Explore at:
zip(47128 bytes)Available download formats
Dataset updated
Feb 21, 2023
Authors
Uttam Mittal
Description

Dataset

This dataset was created by Uttam Mittal

Contents

Search
Clear search
Close search
Google apps
Main menu