18 datasets found
  1. torch_utils

    • kaggle.com
    zip
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khalid Mahamud (2025). torch_utils [Dataset]. https://www.kaggle.com/datasets/khalidmahamud/torch-utils/code
    Explore at:
    zip(36589 bytes)Available download formats
    Dataset updated
    Jan 8, 2025
    Authors
    Khalid Mahamud
    Description

    Dataset

    This dataset was created by Khalid Mahamud

    Contents

  2. Z

    Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maura Pintor; Daniele Angioni; Angelo Sotgiu; Luca Demetrio; Ambra Demontis; Battista Biggio; Fabio Roli (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    University of Cagliari, Italy
    University of Genoa, Italy
    Authors
    Maura Pintor; Daniele Angioni; Angelo Sotgiu; Luca Demetrio; Ambra Demontis; Battista Biggio; Fabio Roli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

    We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

    An example showing how to use the dataset is shown below.

    code for testing robustness of a model

    import os.path

    from torchvision import datasets, transforms, models import torch.utils.data

    class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

    def find_classes(self, directory):
      classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
      if not classes:
        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
      class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
              len(os.listdir(os.path.join(directory, cls_name))) > 0}
      return classes, class_to_idx
    

    extract and unzip the dataset, then write top folder here

    dataset_folder = 'data/ImageNet-Patch'

    available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

    select folder with specific target

    target_label = 954

    dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

    dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

    batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

    accuracy = correct / total attack_sr = attack_success / total

    print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)

  3. h

    PDEBench_2D_DarcyFlow

    • huggingface.co
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Staber (2025). PDEBench_2D_DarcyFlow [Dataset]. https://huggingface.co/datasets/Nionio/PDEBench_2D_DarcyFlow
    Explore at:
    Dataset updated
    Nov 12, 2025
    Authors
    Brian Staber
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example of usage: import torch from plaid.bridges import huggingface_bridge as hfb from torch.utils.data import DataLoader

    def reshape_all(batch: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]: """Helper function that reshapes the flattened fields into images of sizes (128, 128).""" batch["diffusion_coefficient"] = batch["diffusion_coefficient"].reshape( -1, 128, 128 )

    batch["flow"] = batch["flow"].reshape(-1, 128, 128)
    
    return batch
    

    Load the datasetโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/Nionio/PDEBench_2D_DarcyFlow.

  4. PyTorch

    • kaggle.com
    zip
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirza Milan Farabi (2024). PyTorch [Dataset]. https://www.kaggle.com/datasets/mirzamilanfarabi/pytorch
    Explore at:
    zip(123861801 bytes)Available download formats
    Dataset updated
    Oct 22, 2024
    Authors
    Mirza Milan Farabi
    Description

    https://github.com/pytorch/pytorch/raw/main/docs/source/_static/img/pytorch-logo-dark.png" alt="PyTorch Logo">

    PyTorch is a Python package that provides two high-level features: - Tensor computation (like NumPy) with strong GPU acceleration - Deep neural networks built on a tape-based autograd system

    You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

    Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

    More About PyTorch

    Learn the basics of PyTorch

    At a granular level, PyTorch is a library that consists of the following components:

    ComponentDescription
    torchA Tensor library like NumPy, with strong GPU support
    torch.autogradA tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
    torch.jitA compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
    torch.nnA neural networks library deeply integrated with autograd designed for maximum flexibility
    torch.multiprocessingPython multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
    torch.utilsDataLoader and other utility functions for convenience

    Usually, PyTorch is used either as:

    • A replacement for NumPy to use the power of GPUs.
    • A deep learning research platform that provides maximum flexibility and speed.

    Elaborating Further:

    A GPU-Ready Tensor Library

    If you use NumPy, then you have used Tensors (a.k.a. ndarray).

    Tensor illustration

    PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

    We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

    Dynamic Neural Networks: Tape-Based Autograd

    PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

    Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

    With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

    While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy resear...

  5. h

    SemEval_training_data_emotions

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    web (2024). SemEval_training_data_emotions [Dataset]. https://huggingface.co/datasets/dim/SemEval_training_data_emotions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Authors
    web
    Description

    Dataset Card for "SemEval_traindata_emotions"

    ะšะฐะบ ะฑั‹ะป ะฟะพะปัƒั‡ะตะฝ from datasets import load_dataset import datasets from torchvision.io import read_video import json import torch import os from torch.utils.data import Dataset, DataLoader import tqdm

    dataset_path = "./SemEval-2024_Task3/training_data/Subtask_2_train.json"

    dataset = json.loads(open(dataset_path).read()) print(len(dataset))

    all_conversations = []

    for item in dataset: all_conversations.extend(item["conversation"])โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/dim/SemEval_training_data_emotions.

  6. Z

    SELTO Dataset

    • data.niaid.nih.gov
    Updated May 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dittmer, Sรถren; Erzmann, David; Harms, Henrik; Falck, Rielson; Gosch, Marco (2023). SELTO Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7034898
    Explore at:
    Dataset updated
    May 23, 2023
    Dataset provided by
    University of Bremen
    ArianeGroup GmbH
    University of Bremen, University of Cambridge
    Authors
    Dittmer, Sรถren; Erzmann, David; Harms, Henrik; Falck, Rielson; Gosch, Marco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Benchmark Dataset for Deep Learning for 3D Topology Optimization

    This dataset represents voxelized 3D topology optimization problems and solutions. The solutions have been generated in cooperation with the Ariane Group and Synera using the Altair OptiStruct implementation of SIMP within the Synera software. The SELTO dataset consists of four different 3D datasets for topology optimization, called disc simple, disc complex, sphere simple and sphere complex. Each of these datasets is further split into a training and a validation subset.

    The following paper provides full documentation and examples:

    Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

    The Python library DL4TO (https://github.com/dl4to/dl4to) can be used to download and access all SELTO dataset subsets. Each TAR.GZ file container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and contains an associated ground truth solution. Each problem-solution pair consists of two files, where one contains voxel-wise information and the other file contains scalar information. For example, the i-th sample is stored in the files i.csv and i_info.csv, where i.csv contains all voxel-wise information and i_info.csv contains all scalar information. We define all spatially varying quantities at the center of the voxels, rather than on the vertices or surfaces. This allows for a shape-consistent tensor representation.

    For the i-th sample, the columns of i_info.csv correspond to the following scalar information:

    E - Young's modulus [Pa]

    ฮฝ - Poisson's ratio [-]

    ฯƒ_ys - a yield stress [Pa]

    h - discretization size of the voxel grid [m]

    The columns of i.csv correspond to the following voxel-wise information:

    x, y, z - the indices that state the location of the voxel within the voxel mesh

    ฮฉ_design - design space information for each voxel. This is a ternary variable that indicates the type of density constraint on the voxel. 0 and 1 indicate that the density is fixed at 0 or 1, respectively. -1 indicates the absence of constraints, i.e., the density in that voxel can be freely optimized

    ฮฉ_dirichlet_x, ฮฉ_dirichlet_y, ฮฉ_dirichlet_z - homogeneous Dirichlet boundary conditions for each voxel. These are binary variables that define whether the voxel is subject to homogeneous Dirichlet boundary constraints in the respective dimension

    F_x, F_y, F_z - floating point variables that define the three spacial components of external forces applied to each voxel. All forces are body forces given in [N/m^3]

    density - defines the binary voxel-wise density of the ground truth solution to the topology optimization problem

    How to Import the Dataset

    with DL4TO: With the Python library DL4TO (https://github.com/dl4to/dl4to) it is straightforward to download and access the dataset as a customized PyTorch torch.utils.data.Dataset object. As shown in the tutorial this can be done via:

    from dl4to.datasets import SELTODataset

    dataset = SELTODataset(root=root, name=name, train=train)

    Here, root is the path where the dataset should be saved. name is the name of the SELTO subset and can be one of "disc_simple", "disc_complex", "sphere_simple" and "sphere_complex". train is a boolean that indicates whether the corresponding training or validation subset should be loaded. See here for further documentation on the SELTODataset class.

    without DL4TO: After downloading and unzipping, any of the i.csv files can be manually imported into Python as a Pandas dataframe object:

    import pandas as pd

    root = ... file_path = f'{root}/{i}.csv' columns = ['x', 'y', 'z', 'ฮฉ_design','ฮฉ_dirichlet_x', 'ฮฉ_dirichlet_y', 'ฮฉ_dirichlet_z', 'F_x', 'F_y', 'F_z', 'density'] df = pd.read_csv(file_path, names=columns)

    Similarly, we can import a i_info.csv file via:

    file_path = f'{root}/{i}_info.csv' info_column_names = ['E', 'ฮฝ', 'ฯƒ_ys', 'h'] df_info = pd.read_csv(file_path, names=info_columns)

    We can extract PyTorch tensors from the Pandas dataframe df using the following function:

    import torch

    def get_torch_tensors_from_dataframe(df, dtype=torch.float32): shape = df[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1 voxels = [df['x'].values, df['y'].values, df['z'].values]

    ฮฉ_design = torch.zeros(1, *shape, dtype=int)
    ฮฉ_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['ฮฉ_design'].values.astype(int))
    
    ฮฉ_Dirichlet = torch.zeros(3, *shape, dtype=dtype)
    ฮฉ_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['ฮฉ_dirichlet_x'].values, dtype=dtype)
    ฮฉ_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['ฮฉ_dirichlet_y'].values, dtype=dtype)
    ฮฉ_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['ฮฉ_dirichlet_z'].values, dtype=dtype)
    
    F = torch.zeros(3, *shape, dtype=dtype)
    F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_x'].values, dtype=dtype)
    F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_y'].values, dtype=dtype)
    F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_z'].values, dtype=dtype)
    
    density = torch.zeros(1, *shape, dtype=dtype)
    density[:, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['density'].values, dtype=dtype)
    
    return ฮฉ_design, ฮฉ_Dirichlet, F, density
    
  7. h

    cifar100_Vit_large

    • huggingface.co
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aakash Kumar Agarwal (2025). cifar100_Vit_large [Dataset]. https://huggingface.co/datasets/aaaakash001/cifar100_Vit_large
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2025
    Authors
    Aakash Kumar Agarwal
    Description

    This dataset is original cifar10 dataset but also contains features from vit_large.

      Import packages
    

    from transformers import AutoImageProcessor, AutoModelForImageClassification import torch from torch.utils.data import DataLoader from datasets import load_dataset

      model
    

    processor = AutoImageProcessor.from_pretrained("google/vit-large-patch32-384",use_fast=True) model = AutoModelForImageClassification.from_pretrained("google/vit-large-patch32-384")โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/aaaakash001/cifar100_Vit_large.

  8. 3xM 10 80 (RGB-D Instance Seg. for bin-picking)

    • kaggle.com
    zip
    Updated Nov 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tobia Ippolito (2024). 3xM 10 80 (RGB-D Instance Seg. for bin-picking) [Dataset]. https://www.kaggle.com/datasets/tobiaippolito/3xm-10-80
    Explore at:
    zip(66443229506 bytes)Available download formats
    Dataset updated
    Nov 12, 2024
    Authors
    Tobia Ippolito
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    In short

    This dataset used to investigate the influence of the unique amount of 3D-Models (Shapes) and Materials (Textures) towards the shape-textures bias, performance and generalization of deep neural network instance segmentation in my bachelor exam.

    • one of nine datasets created in Unreal Engine 5 with an NVIDIA RTX A4500
    • It uses 10 unique shapes and 80 unique textures
    • RGB, depth and solution masks are available
    • 20.000 Scenes
    • Ready to use Dataloader, training and inference -> see next section

    Usage

    You can load the images like:

    import cv2
    
    image = cv2.imread(img_path)
    if image is None:
      raise FileNotFoundError(f"Error during data loading: there is no '{img_path}'")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        
    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
    if len(depth.shape) > 2:
      _, depth, _, _ = cv2.split(depth)
          
    mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)  # cv2.IMREAD_GRAYSCALE)
    

    For easy use I recommend to use my own code. You can directly use it to train Mask R-CNN or just use the dataloader. Both are shown now:

    First: Clone my torch github project into your project terminal cd ./path/to/your/project git clone https://github.com/xXAI-botXx/torch-mask-rcnn-instance-segmentation.git Second: Install the anaconda env (optional) terminal cd ./path/to/your/project cd ./torch-mask-rcnn-instance-segmentation conda env create -f conda_env.yml Third: You are ready to use

    Using only the dataloader for your custom project: ```python import os import numpy as np import matplotlib.pyplot as plt import cv2 from torch.utils.data import DataLoader

    import sys sys.path.append("./torch-mask-rcnn-instance-segmentation")

    from maskrcnn_toolkit import DATA_LOADING_MODE, Dual_Dir_Dataset, collate_fn, extract_and_visualize_mask

    data_mode = DATA_LOADING_MODE.ALL

    dataset = Dual_Dir_Dataset(img_dir="/path/to/rgb-folder", depth_dir="/path/to/depth-folder", mask_dir="/path/to/mask-folder", transform=None, amount=1, start_idx=0, end_idx=0, image_name="...", data_mode=data_mode, use_mask=True, use_depth=False, log_path="./logs", width=1920, height=1080, should_log=True, should_print=True, should_verify=False) data_loader = DataLoader(dataset, batch_size=5, shuffle=True, num_workers=4, collate_fn=collate_fn)

    plot

    for data in data_loader: for batch_idx in range(len(data[0])): if len(data) == 3: image = data[0][batch_idx].cpu().unsqueeze(0) masks = data[1][batch_idx]["masks"] masks = masks.cpu() name = data[2][batch_idx] else: image = data[0][batch_idx].cpu().unsqueeze(0) name = data[1][batch_idx]

      image = image.cpu().numpy().squeeze(0)
      image = np.transpose(image, (1, 2, 0)) # Convert to HWC
    
      # Remove 4.th channel if existing
      if image.shape[2] == 4:
        depth = image[:, :, 3]
        image = image[:, :, :3]
      else:
        depth = None
    
      masks_gt = masks.cpu().numpy()
      masks_gt = np.transpose(masks_gt, (1, 2, 0))
      mask = extract_and_visualize_mask(masks_gt, image=None, ax=None, visualize=False, color_map=None, soft_join=False)
    
      # plot
      cols = 1
      if depth is not None:
        cols += 1
      if mask is not None:
        cols += 1
    
      fig, ax = plt.subplots(nrows=1, ncols=cols, figsize=(20, 15*cols))
      fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.05, hspace=0.05)
    
      plot_idx = 0
      ax[plot_idx].imshow(image)
      ax[plot_idx].set_title("RGB Input Image")
      ax[plot_idx].axis("off")
    
      if depth is not None:
        plot_idx += 1
        ax[plot_idx].imshow(depth, cmap="gray")
        ax[plot_idx].set_title("Depth Input Image")
        ax[plot_idx].axis("off")
    
      if mask is not None:
        plot_idx += 1
        ax[plot_idx].imshow(mask)
        ax[plot_idx].set_title("Mask Ground Truth")
        ax[plot_idx].axis("off")
    
      plt.show()
    
    
    **Using the whole Mask R-CNN training pipeline:**
    ```python
    import sys
    sys.path.append("./torch-mask-rcnn-instance-segmentation")
    
    from maskrcnn_toolkit import DATA_LOADING_MODE, train
    
    
    # set the vars as you need
    
    WEIGHTS_PATH = None   # Path to the model weights file
    USE_DEPTH = False      # Whether to include depth information -> as rgb and depth on green channel
    VERIFY_DATA = False     # True is recommended
    
    GROUND_PATH = "D:/3xM"  
    DATASET_NAME = "3xM_Dataset_10_80"
    IMG_DIR = os.path.join(GRO...
    
  9. h

    multimodal_sarcasm_detection

    • huggingface.co
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ๆ–นๅญไธœ (2023). multimodal_sarcasm_detection [Dataset]. https://huggingface.co/datasets/quaeast/multimodal_sarcasm_detection
    Explore at:
    Dataset updated
    Nov 3, 2023
    Authors
    ๆ–นๅญไธœ
    Description

    copy of data-of-multimodal-sarcasm-detection

    usage

    from datasets import load_dataset from transformers import CLIPImageProcessor, CLIPTokenizer from torch.utils.data import DataLoader

    image_processor = CLIPImageProcessor.from_pretrained(clip_path) tokenizer = CLIPTokenizer.from_pretrained(clip_path)

    def tokenization(example): text_inputs = tokenizer(example["text"], truncation=True, padding=True, return_tensors="pt") image_inputs = image_processor(example["image"], return_tensors="pt")โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/quaeast/multimodal_sarcasm_detection.

  10. h

    mmi-bendr-preprocessed

    • huggingface.co
    Updated Feb 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rasmus Aagaard (2024). mmi-bendr-preprocessed [Dataset]. https://huggingface.co/datasets/rasgaard/mmi-bendr-preprocessed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2024
    Authors
    Rasmus Aagaard
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    The EEG Motor Movement/Imagery (MMI) Dataset preprocessed with DN3 to be used for downstream fine-tuning with BENDR. The labels correspond to Task 4 (imagine opening and closing both fists or both feet) from experimental runs 4, 10 and 14.

      Creating dataloaders
    

    from datasets import load_dataset from torch.utils.data import DataLoader

    dataset = load_dataset("rasgaard/mmi-bendr-preprocessed") dataset.set_format("torch")

    train_loader = DataLoader(dataset["train"], batch_size=8)โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/rasgaard/mmi-bendr-preprocessed.

  11. h

    MMSD2.0

    • huggingface.co
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Junjie Chen (2024). MMSD2.0 [Dataset]. http://doi.org/10.57967/hf/4648
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Authors
    Junjie Chen
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

    This is a copy of the dataset uploaded on Hugging Face for easy access. The original data comes from this work, which is an improvement upon a previous study.

      Usage
    

    from typing import TypedDict, cast

    import pytorch_lightning as pl from datasets import Dataset, load_dataset from torch import Tensor from torch.utils.data import DataLoader from transformers import CLIPProcessor

    classโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/coderchen01/MMSD2.0.

  12. heptapod_dataset

    • kaggle.com
    zip
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matheus Latorre Cavini (2025). heptapod_dataset [Dataset]. https://www.kaggle.com/datasets/matheuslatorrecavini/heptapod-dataset/discussion
    Explore at:
    zip(10262538 bytes)Available download formats
    Dataset updated
    Jun 28, 2025
    Authors
    Matheus Latorre Cavini
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This dataset consists of 4900 images of logograms from Heptapod B language, in resolution 224x224, and the captions for their meaning in English. There are 49 unique logograms and 100 variations (rotation, scaling, translation) for each of them.

    Original source of the data: Wolfram Research GitHub Repository. Distributed under Creative Commons Attribution-NonCommercial 4.0 International License.

    The dataset was augmented by merging morphems of the logograms and by applying geometric transformations to create variations of each image.

    The captions.txt file provide captions for each unique logogram, and can interpreted as:

    • 000.png | Abbot is dead is the caption for images 0000.png to 0099.png
    • 001.png | Abbot is the caption for images 0100.png to 0199.png
    • 002.png | Abbot chooses save humanityis the caption for images 0200.png to 0299.png
    • And so on

    Suggested loading for PyTorch:

    from PIL import Image
    import torch
    from torch.utils.data import Dataset, DataLoader
    from torchvision import transforms
    import os
    
    class TextToImageDataset(Dataset):
      def _init_(self, image_dir, captions_file, transform=None):
        self.image_dir = image_dir # Path for the images on the dataset
        self.transform = transform
        self.pairs = [] # Array to store (image, sentence) pairs
    
        with open(captions_file, "r") as f:
          for line in f:
            idx, caption = line.strip().split("|")
            idx = idx.strip().split(".")[0]
            caption = caption.strip()
            for i in range(100):
              img_file = f"{(int(idx)*100 + i):04d}.png" # Get the image number by doing idx*100 + i 
              self.pairs.append((caption, img_file))   # Apply the same caption for every variation of the same logogram
    
      def _len_(self):
        return len(self.pairs)
    
      def _getitem_(self, idx):
        text, img_file = self.pairs[idx]
        image = Image.open(os.path.join(self.image_dir, img_file)).convert("RGB")
        if self.transform:
          image = self.transform(image)
        return text, image #item = (text, image)
    
    transform = transforms.Compose([
      transforms.Resize((224, 224)),
      transforms.ToTensor()
    ])
    
    base_dir = "/kaggle/input/heptapod-dataset/dataset/"
    
    dataset = TextToImageDataset(image_dir=base_dir+"images",captions_file=base_dir+"captions.txt", transform=transform)
    
  13. h

    klue-nli-simcse

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PhnyX Lab, klue-nli-simcse [Dataset]. https://huggingface.co/datasets/phnyxlab/klue-nli-simcse
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    PhnyX Lab LLC
    Authors
    PhnyX Lab
    Description

    KLUENLI for SimCSE Dataset

    For a better dataset description, please visit: LINK

    This dataset was prepared by converting KLUENLI dataset to use it for contrastive training (SimCSE). The code used to prepare the data is given below: import pandas as pd from datasets import load_dataset, concatenate_datasets, Dataset from torch.utils.data import random_split

    class PrepTriplets: @staticmethod def make_dataset(): train_dataset = load_dataset("klue", "nli"โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/phnyxlab/klue-nli-simcse.

  14. Star Wars Chat Bot

    • kaggle.com
    zip
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Star Wars Chat Bot [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/star-wars-chat-bot/discussion
    Explore at:
    zip(3138 bytes)Available download formats
    Dataset updated
    Dec 8, 2021
    Authors
    Aslan Ahmedov
    Description

    Star-Wars-Chatbot

    Simple chatbot implementation with PyTorch. A chatbot made in Python that features various data about the Star Wars universe. This is a generic chatbot. Can be trained on pretty much any conversation as long as formatted correctly JSON file. I used it for a final project in Artificial Intelligence. To use just run the script training first, then run your chatbot. For more please have a look on GitHub

    Introduction

    Chatbots are extremely helpful for business organizations and also the customers. The majority of people prefer to talk directly from a chatbox instead of calling service centers. Today I am going to build an exciting project on Chatbot. I will implement a chatbot from scratch that will be able to understand what the user is talking about and give an appropriate response. Chatbots are nothing but an intelligent piece of software that can interact and communicate with people just like humans. Here in this project we created an AI Chatbot which is focused for The Star Wars Cinematic Universe and trying training it in such a way that it can answer some of the basics queries about Star Wars.

    Explanation Of Chatbot

    Chatbots are basically AI intelligence bots which can interact with the user or customers depends upon the usage. It is an application of Artificial Intelligence and Machine Learningยฌ. Now-a-days technology is increasing rapidly. In this technological world every industry is trying to automate things to provide better services. One of the great application of automation would be chatbot.

    There are basically two types of Chatbots :

    • Command based: Chatbots that function on predefined rules and can answer to only limited queries or questions. Users need to select an option to determine their next step.
    • Intelligent/AI Chatbots: Chatbots that leverage Machine Learning and Natural Language Understanding to understand the userโ€™s language and are intelligent enough to learn from conversations with their users. You can converse via text, speech or even interact with a chatbot using graphical interfaces.

    All chatbots come under the NLP (Natural Language Processing) concepts. NLP is composed of two things: - NLU (Natural Language Understanding): The ability of machines to understand human language like English. - NLG (Natural Language Generation): The ability of a machine to generate text similar to human written sentences Imagine a user asking a question to a chatbot: โ€œHey, whatโ€™s on the news today?โ€ The chatbot will break down the user sentence into two things: intent and an entity. The intent for this sentence could be get_news as it refers to an action the user wants to perform. The entity tells specific details about the intent, so "today" will be the entity. So this way, a machine learning model is used to recognize the intents and entities of the chat.

    Strategy

    • Import Libraries and Load the Data
    • Preprocessing the Data
    • Create Training and Testing Data
    • Training the Model
    • Graphical user interface

    Import Libraries and Load the Data

    I created a new python file and name it as chatbot.py and then import all the required modules. After that I loaded starwarsintents.json data file in our Python program.

    import numpy as np
    import nltk
    from nltk.stem.porter import PorterStemmer
    
    stemmer = PorterStemmer()
    import torch
    import torch.nn as nn
    import random
    import json
    from torch.utils.data import Dataset, DataLoader
    from tkinter import *
    
    with open("starwarsintents.json", "r") as f:
      intents = json.load(f)
     ```
    
    ## Preprocessing the Data
    
    - Creating Custom Functions:
    
    We will create custom Functions so that it is easy for us to implement afterwards. Natural language (nltk) took kit is a really useful library that contains important classes that will be useful in any of your NLP task. To know a bit more about Natural language (nltk). Please click [here](https://machinelearningmastery.com/natural-language-processing/) for more information.
    
    - Stemming:
    
    If we have 3 words like โ€œwalkโ€, โ€œwalkedโ€, โ€œwalkingโ€, these might seem different words but they generally have the same meaning and also have the same base form; โ€œwalkโ€. So, in order for our model to understand all different form of the same words we need to train our model with that form. This is called Stemming. There are different methods that we can use for stemming. Here we will use Porter Stemmer model form our NLTK Library. For more information click [here](http://snowball.tartarus.org/algorithms/porter/stemmer.html).
    
    - Bag of Words:
    
    We will be splitting each word in the sentences and adding it to an array. We will be using bag of words. Which will initially be a list of zeros with the size equal to the length of the all words array.If we have a array of sentences = ["hello", "how", "are", "you"] and an array of total words = ["hi", "hel...
    
  15. Data from: Duck Hunt

    • kaggle.com
    zip
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt
    Explore at:
    zip(7379197 bytes)Available download formats
    Dataset updated
    Jul 26, 2025
    Authors
    Hugo Zanini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Duck Hunt Object Detection Dataset

    This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

    Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

    ๐ŸŽฏ Dataset Statistics

    MetricValue
    Total Images1,004
    Dataset Size12 MB
    Image FormatPNG
    Annotation FormatYOLO (.txt)
    Classes4
    Train/Val Split711/260 (73%/27%)

    Class Distribution

    Class IDClass NameCountDescription
    0dog252The hunting dog in various poses (jumping, laughing, sniffing, etc.)
    1duck_dead256Dead ducks (both black and red variants)
    2duck_shot248Ducks in the moment of being shot
    3duck_flying248Flying ducks in all directions (left, right, diagonal)

    ๐Ÿ“ Dataset Structure

    yolo_dataset_augmented/
    โ”œโ”€โ”€ images/
    โ”‚  โ”œโ”€โ”€ train/      # 711 training images
    โ”‚  โ””โ”€โ”€ val/       # 260 validation images
    โ”œโ”€โ”€ labels/
    โ”‚  โ”œโ”€โ”€ train/      # 711 YOLO annotation files
    โ”‚  โ””โ”€โ”€ val/       # 260 YOLO annotation files
    โ”œโ”€โ”€ classes.txt     # Class names mapping
    โ”œโ”€โ”€ dataset.yaml     # YOLO configuration file
    โ””โ”€โ”€ augmented_dataset_stats.json # Detailed statistics
    

    ๐Ÿ”ง Data Augmentation Details

    The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

    Augmentation Techniques Applied:

    • Geometric Transformations: Rotation (ยฑ15ยฐ), horizontal/vertical flipping, scaling (0.8-1.2x), translation
    • Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
    • Quality Variations: Gaussian noise, slight blur for robustness
    • Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

    Augmentation Parameters:

    {
      'rotation_range': (-15, 15),    # Small rotations for game sprites
      'brightness_range': (0.7, 1.3),  # Brightness variations
      'contrast_range': (0.8, 1.2),   # Contrast adjustments
      'saturation_range': (0.8, 1.2),  # Color saturation
      'noise_intensity': 0.02,      # Gaussian noise
      'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
      'scaling_range': (0.8, 1.2),    # Scale variations
    }
    

    ๐Ÿš€ Usage Examples

    Loading with YOLOv8 (Ultralytics)

    from ultralytics import YOLO
    
    # Load and train
    model = YOLO('yolov8n.pt') # Load pretrained model
    results = model.train(data='dataset.yaml', epochs=100, imgsz=640)
    
    # Validate
    metrics = model.val()
    
    # Predict
    results = model('path/to/test/image.png')
    

    Loading with PyTorch

    import torch
    from torch.utils.data import Dataset, DataLoader
    from PIL import Image
    import os
    
    class DuckHuntDataset(Dataset):
      def _init_(self, images_dir, labels_dir, transform=None):
        self.images_dir = images_dir
        self.labels_dir = labels_dir
        self.transform = transform
        self.images = os.listdir(images_dir)
      
      def _len_(self):
        return len(self.images)
      
      def _getitem_(self, idx):
        img_path = os.path.join(self.images_dir, self.images[idx])
        label_path = os.path.join(self.labels_dir, 
                     self.images[idx].replace('.png', '.txt'))
        
        image = Image.open(img_path)
        # Load YOLO annotations
        with open(label_path, 'r') as f:
          labels = f.readlines()
        
        if self.transform:
          image = self.transform(image)
          
        return image, labels
    
    # Usage
    dataset = DuckHuntDataset('images/train', 'labels/train')
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    

    YOLO Annotation Format

    Each .txt file contains one line per object: class_id center_x center_y width height

    Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

    ๐Ÿ“Š Technical Specifications

    • Image Dimensions: Variable (original sprite sizes preserved)
    • Color Channels: RGB (3 channels)
    • Annotation Precision: Float32 (normalized coordinates)
    • File Naming: Descriptive names indicating class and augmentation type
    • Quality: High-resolution pixel art sprites

    ๐ŸŽฎ Dataset Context

    This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

    • The Dog: Your hunting companion who retrieves ducks and ...
  16. feral-cat-segmentation_dataset

    • kaggle.com
    • universe.roboflow.com
    zip
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lu hou yang (2025). feral-cat-segmentation_dataset [Dataset]. https://www.kaggle.com/datasets/luhouyang/feral-cat-segmentation-dataset
    Explore at:
    zip(971125684 bytes)Available download formats
    Dataset updated
    Mar 18, 2025
    Authors
    lu hou yang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Feral Cat Segmentation Dataset

    Overview

    This dataset provides image segmentation data for feral cats, designed for computer vision and machine learning tasks. It builds upon the original public domain dataset by Paul Cashman from Roboflow, with additional preprocessing and multiple data formats for easier consumption.

    Dataset Source

    Dataset Contents

    The dataset is organized into three standard splits: - Train set - Validation set - Test set

    Each split contains data in multiple formats: 1. Original JPG images 2. Segmentation mask JPG images 3. Parquet files containing flattened image and mask data 4. Pickle files containing serialized image and mask data

    Data Formats

    1. Image Files

    • Format: JPG
    • Resolution: 224ร—224 pixels
    • Directory Structure:
      • train/: Original training images
      • valid/: Original validation images
      • test/: Original test images
      • train_mask/: Corresponding segmentation masks for training
      • valid_mask/: Corresponding segmentation masks for validation
      • test_mask/: Corresponding segmentation masks for testing

    2. Parquet Files

    • Files: train_dataset.parquet, valid_dataset.parquet, test_dataset.parquet
    • Content: Flattened image data and corresponding masks combined in a single table
    • Structure: Each row contains the flattened pixel values of an image followed by the flattened pixel values of its mask
    • Data Division: Image and mask data are split at index split_at = image_size[0] * image_size[1] * image_channels
      • Data before this index: image pixel values (reshaped to [-1, 224, 224, 3])
      • Data after this index: mask pixel values (reshaped to [-1, 224, 224, 1])
    • Benefits: Efficient storage and faster loading compared to individual image files

    3. Pickle Files

    • Files: train_dataset.pkl, valid_dataset.pkl, test_dataset.pkl
    • Content: Serialized Python objects containing images and their corresponding masks
    • Structure: List of [image, mask] pairs, where each image and mask is serialized using Python's pickle
    • Data Access: Similar to parquet files, when loaded through the provided dataset class, data is split at the same index: split_at = image_size[0] * image_size[1] * image_channels
    • Benefits: Preserves original data structure and enables quick loading in Python

    4. CSV Files

    • Files: train_dataset.csv, valid_dataset.csv, test_dataset.csv
    • Content: Same data as parquet files but in CSV format
    • Structure: No headers, raw flattened pixel values
    • Data Division: Same split point as parquet files

    Image Preprocessing

    All images were preprocessed with the following operations: - Resized to 224ร—224 pixels using bilinear interpolation - Segmentation masks were also resized to match the images using nearest neighbor interpolation - Original RLE (Run-Length Encoding) segmentation data converted to binary masks

    Data Normalization

    When used with the provided PyTorch dataset class, images are normalized with: - Mean: [0.48235, 0.45882, 0.40784] - Standard Deviation: [0.00392156862745098, 0.00392156862745098, 0.00392156862745098]

    PyTorch Integration

    A custom CatDataset class is included for easy integration with PyTorch:

    from cat_dataset import CatDataset
    
    # Load from parquet format
    dataset = CatDataset(
      root="path/to/dataset",
      split="train", # Options: "train", "valid", "test"
      format="parquet", # Options: "parquet", "pkl"
      image_size=[224, 224],
      image_channels=3,
      mask_channels=1
    )
    
    # Use with PyTorch DataLoader
    from torch.utils.data import DataLoader
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    

    Performance Comparison

    Loading time benchmarks from the original implementation: - Parquet format: ~1.29 seconds per iteration - Pickle format: ~0.71 seconds per iteration

    The pickle format provides the fastest loading times and is recommended for most use cases.

    Citation

    If you use this dataset in your research or projects, please cite:

    @misc{feral-cat-segmentation_dataset,
     title = {feral-cat-segmentation Dataset},
     type = {Open Source Dataset},
     author = {Paul Cashman},
     howpublished = {\url{https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}},
     url = {https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation},
     journal = {Roboflow Universe},
     publisher = {Roboflow},
     year = {2025},
     month = {mar},
     note = {visited on 2025-03-19},
    }
    

    Sample Usage Code

    Basic Dataset Loading

    from ca...
    
  17. MELD Preprocessed

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
    Explore at:
    zip(3527202381 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Argish Abhangi
    Description

    The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

    Data Sources

    • Audio: Waveforms extracted from the original video files.
    • Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.
    • Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.
    • Emotion Labels: Each sample is associated with an emotion label.

    Preprocessing Pipeline

    The preprocessing script performs several key steps:

    1. Text Cleaning:

      • fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.
      • replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "ร‚โ€™" with the proper apostrophe).
    2. Audio Processing:

      • Extracts raw audio waveform from each sample.
      • Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).
      • Converts the spectrogram to a logarithmic scale for numerical stability.
    3. Video Processing:

      • Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.
      • For each video, samples frames evenly based on the original video's FPS.
      • Applies Haar Cascade face detection on the frames to extract the first detected face.
      • Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.
    4. Saving Processed Samples:

      • Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).
      • The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

    Data Format

    Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

    • utterance (str): The cleaned textual utterance.
    • emotion (str/int): The corresponding emotion label.
    • video_path (str): Original path to the video file from which the sample was extracted.
    • audio (Tensor): Raw audio waveform tensor of shape [channels, time].
    • audio_sample_rate (int): The sampling rate of the audio waveform.
    • audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].
    • face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

    Directory Structure

    The preprocessed files are organized into splits: preprocessed_data/ โ”œโ”€โ”€ train/ โ”‚ โ”œโ”€โ”€ dia0_utt0.pt โ”‚ โ”œโ”€โ”€ dia1_utt1.pt โ”‚ โ””โ”€โ”€ ... โ”œโ”€โ”€ dev/ โ”‚ โ”œโ”€โ”€ dia0_utt0.pt โ”‚ โ”œโ”€โ”€ dia1_utt1.pt โ”‚ โ””โ”€โ”€ ... โ””โ”€โ”€ test/ โ”‚ โ”œโ”€โ”€ dia0_utt0.pt โ”‚ โ”œโ”€โ”€ dia1_utt1.pt โ””โ”€โ”€ ...

    Loading and Using the Dataset

    A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

    Dataset Class

    from torch.utils.data import Dataset
    import os
    import torch
    
    class PreprocessedMELDDataset(Dataset):
      def _init_(self, data_dir):
        """
        Args:
          data_dir (str): Directory where preprocessed .pt files are stored.
        """
        self.data_dir = data_dir
        self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')]
        
      def _len_(self):
        return len(self.files)
      
      def _getitem_(self, idx):
        sample_path = self.files[idx]
        sample = torch.load(sample_path)
        return sample
    

    Custom Collate Function

    def preprocessed_collate_fn(batch):
      """
      Collates a list of sample dictionaries into a single dictionary with keys mapping to lists.
      Modify this function to pad or stack tensor data if needed.
      """
      collated = {}
      collated['utterance'] = [sample['utterance'] for sample in batch]
      collated['emotion'] = [sample['emotion'] for sample in batch]
      collated['video_path'] = [sample['video_path'] for sample in batch]
      collated['audio'] = [sample['audio'] for sample in batch]
      collated['audio_sample_rate'] = batch[0]['audio_sample_rate']
      collated['audio_mel'] = [sample['audio_mel'] for sample in batch]
      collated['face'] = [sample['face'] for sample in batch]
      return collated
    

    Creating DataLoaders

    from torch.utils.data import DataLoader
    
    # Define paths for each split
    train_data_dir = "preprocessed_data/train"
    dev_data_dir = "preproces...
    
  18. CUB 200 2011

    • kaggle.com
    zip
    Updated Apr 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hari (2020). CUB 200 2011 [Dataset]. https://www.kaggle.com/coolerextreme/cub-200-2011
    Explore at:
    zip(2992624251 bytes)Available download formats
    Dataset updated
    Apr 18, 2020
    Authors
    Hari
    Description

    Context

    To make it easy to use the dataset with pytorch here is a utility script utils.py

    This is the Caltech UCSD Birds (CUB) 200 2011 extended dataset (http://www.vision.caltech.edu/visipedia/CUB-200-2011.html). The original CUB dataset doesn't have captions. A separate paper (https://arxiv.org/pdf/1605.05395.pdf) collected 10 captions for each image in CUB. The class id for each image is given in metadata.pth file. The captions were encoded using spacy tokenizer and saved in metadata.pth file. The images were resized (maintaining aspect ratio) and center cropped to 64x64, 128x128 and 256x256 resolutions. There are 2 official dataset splits: train_val and test.

    The word_id_to_word and word_to_word_id mappings in metadata.pth file can be used to encode and decode the captions. The class_id_to_class_name and class_name_to_class_id mappings in metadata.pth file can be used to encode and decode the class ids.

    All .pth files were created using pytorch torch.save(). They can be loaded using pytorch torch.load()

    When processing captions and creating the vocabulary, words that appeared less than 5 times were replaced with `

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Khalid Mahamud (2025). torch_utils [Dataset]. https://www.kaggle.com/datasets/khalidmahamud/torch-utils/code
Organization logo

torch_utils

Explore at:
24 scholarly articles cite this dataset (View in Google Scholar)
zip(36589 bytes)Available download formats
Dataset updated
Jan 8, 2025
Authors
Khalid Mahamud
Description

Dataset

This dataset was created by Khalid Mahamud

Contents

Search
Clear search
Close search
Google apps
Main menu