17 datasets found

3xM 10 10 (RGB-D Instance Seg. for bin-picking)
kaggle.com
zip
Updated Nov 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobia Ippolito (2024). 3xM 10 10 (RGB-D Instance Seg. for bin-picking) [Dataset]. https://www.kaggle.com/datasets/tobiaippolito/3xm-10-10
Explore at:
zip(67215581908 bytes)Available download formats
Dataset updated
Nov 12, 2024
Authors
Tobia Ippolito
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
In short

This dataset used to investigate the influence of the unique amount of 3D-Models (Shapes) and Materials (Textures) towards the shape-textures bias, performance and generalization of deep neural network instance segmentation in my bachelor exam.

one of nine datasets created in Unreal Engine 5 with an NVIDIA RTX A4500

It uses 160 unique shapes and 80 unique textures

RGB, depth and solution masks are available

20.000 Scenes

Ready to use Dataloader, training and inference -> see next section

Usage

You can load the images like:

import cv2 image = cv2.imread(img_path) if image is None: raise FileNotFoundError(f"Error during data loading: there is no '{img_path}'") image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED) if len(depth.shape) > 2: _, depth, _, _ = cv2.split(depth) mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED) # cv2.IMREAD_GRAYSCALE)

For easy use I recommend to use my own code. You can directly use it to train Mask R-CNN or just use the dataloader. Both are shown now:

First: Clone my torch github project into your project terminal cd ./path/to/your/project git clone https://github.com/xXAI-botXx/torch-mask-rcnn-instance-segmentation.git Second: Install the anaconda env (optional) terminal cd ./path/to/your/project cd ./torch-mask-rcnn-instance-segmentation conda env create -f conda_env.yml Third: You are ready to use

Using only the dataloader for your custom project: ```python import os import numpy as np import matplotlib.pyplot as plt import cv2 from torch.utils.data import DataLoader

import sys sys.path.append("./torch-mask-rcnn-instance-segmentation")

from maskrcnn_toolkit import DATA_LOADING_MODE, Dual_Dir_Dataset, collate_fn, extract_and_visualize_mask

data_mode = DATA_LOADING_MODE.ALL

dataset = Dual_Dir_Dataset(img_dir="/path/to/rgb-folder", depth_dir="/path/to/depth-folder", mask_dir="/path/to/mask-folder", transform=None, amount=1, start_idx=0, end_idx=0, image_name="...", data_mode=data_mode, use_mask=True, use_depth=False, log_path="./logs", width=1920, height=1080, should_log=True, should_print=True, should_verify=False) data_loader = DataLoader(dataset, batch_size=5, shuffle=True, num_workers=4, collate_fn=collate_fn)

plot

for data in data_loader: for batch_idx in range(len(data[0])): if len(data) == 3: image = data[0][batch_idx].cpu().unsqueeze(0) masks = data[1][batch_idx]["masks"] masks = masks.cpu() name = data[2][batch_idx] else: image = data[0][batch_idx].cpu().unsqueeze(0) name = data[1][batch_idx]

image = image.cpu().numpy().squeeze(0) image = np.transpose(image, (1, 2, 0)) # Convert to HWC # Remove 4.th channel if existing if image.shape[2] == 4: depth = image[:, :, 3] image = image[:, :, :3] else: depth = None masks_gt = masks.cpu().numpy() masks_gt = np.transpose(masks_gt, (1, 2, 0)) mask = extract_and_visualize_mask(masks_gt, image=None, ax=None, visualize=False, color_map=None, soft_join=False) # plot cols = 1 if depth is not None: cols += 1 if mask is not None: cols += 1 fig, ax = plt.subplots(nrows=1, ncols=cols, figsize=(20, 15*cols)) fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.05, hspace=0.05) plot_idx = 0 ax[plot_idx].imshow(image) ax[plot_idx].set_title("RGB Input Image") ax[plot_idx].axis("off") if depth is not None: plot_idx += 1 ax[plot_idx].imshow(depth, cmap="gray") ax[plot_idx].set_title("Depth Input Image") ax[plot_idx].axis("off") if mask is not None: plot_idx += 1 ax[plot_idx].imshow(mask) ax[plot_idx].set_title("Mask Ground Truth") ax[plot_idx].axis("off") plt.show()

**Using the whole Mask R-CNN training pipeline:** ```python import sys sys.path.append("./torch-mask-rcnn-instance-segmentation") from maskrcnn_toolkit import DATA_LOADING_MODE, train # set the vars as you need WEIGHTS_PATH = None # Path to the model weights file USE_DEPTH = False # Whether to include depth information -> as rgb and depth on green channel VERIFY_DATA = False # True is recommended GROUND_PATH = "D:/3xM" DATASET_NAME = "3xM_Dataset_10_10" IMG_DIR = os.path.join(GR...
Z
Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...
data.niaid.nih.gov
zenodo.org
Updated Jun 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maura Pintor; Daniele Angioni; Angelo Sotgiu; Luca Demetrio; Ambra Demontis; Battista Biggio; Fabio Roli (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
Explore at:
Dataset updated
Jun 30, 2022
Dataset provided by
University of Genoa, Italy
University of Cagliari, Italy
Authors
Maura Pintor; Daniele Angioni; Angelo Sotgiu; Luca Demetrio; Ambra Demontis; Battista Biggio; Fabio Roli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

An example showing how to use the dataset is shown below.

code for testing robustness of a model

import os.path

from torchvision import datasets, transforms, models import torch.utils.data

class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

def find_classes(self, directory): classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir()) if not classes: raise FileNotFoundError(f"Couldn't find any class folder in {directory}.") class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if len(os.listdir(os.path.join(directory, cls_name))) > 0} return classes, class_to_idx

extract and unzip the dataset, then write top folder here

dataset_folder = 'data/ImageNet-Patch'

available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

select folder with specific target

target_label = 954

dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

accuracy = correct / total attack_sr = attack_success / total

print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)
h
mmi-bendr-preprocessed
huggingface.co
Updated Feb 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rasmus Aagaard (2024). mmi-bendr-preprocessed [Dataset]. https://huggingface.co/datasets/rasgaard/mmi-bendr-preprocessed
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2024
Authors
Rasmus Aagaard
License
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
Description
The EEG Motor Movement/Imagery (MMI) Dataset preprocessed with DN3 to be used for downstream fine-tuning with BENDR. The labels correspond to Task 4 (imagine opening and closing both fists or both feet) from experimental runs 4, 10 and 14.

Creating dataloaders

from datasets import load_dataset from torch.utils.data import DataLoader

dataset = load_dataset("rasgaard/mmi-bendr-preprocessed") dataset.set_format("torch")

train_loader = DataLoader(dataset["train"], batch_size=8)… See the full description on the dataset page: https://huggingface.co/datasets/rasgaard/mmi-bendr-preprocessed.
h
PDEBench_2D_DarcyFlow
huggingface.co
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Staber (2025). PDEBench_2D_DarcyFlow [Dataset]. https://huggingface.co/datasets/Nionio/PDEBench_2D_DarcyFlow
Explore at:
Dataset updated
Nov 12, 2025
Authors
Brian Staber
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Example of usage: import torch from plaid.bridges import huggingface_bridge as hfb from torch.utils.data import DataLoader

def reshape_all(batch: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]: """Helper function that reshapes the flattened fields into images of sizes (128, 128).""" batch["diffusion_coefficient"] = batch["diffusion_coefficient"].reshape( -1, 128, 128 )

batch["flow"] = batch["flow"].reshape(-1, 128, 128) return batch

Load the dataset… See the full description on the dataset page: https://huggingface.co/datasets/Nionio/PDEBench_2D_DarcyFlow.
h
SemEval_training_data_emotions
huggingface.co
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
web (2024). SemEval_training_data_emotions [Dataset]. https://huggingface.co/datasets/dim/SemEval_training_data_emotions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2024
Authors
web
Description
Dataset Card for "SemEval_traindata_emotions"

Как был получен from datasets import load_dataset import datasets from torchvision.io import read_video import json import torch import os from torch.utils.data import Dataset, DataLoader import tqdm

dataset_path = "./SemEval-2024_Task3/training_data/Subtask_2_train.json"

dataset = json.loads(open(dataset_path).read()) print(len(dataset))

all_conversations = []

for item in dataset: all_conversations.extend(item["conversation"])… See the full description on the dataset page: https://huggingface.co/datasets/dim/SemEval_training_data_emotions.
h
cifar100_Vit_large
huggingface.co
Updated Nov 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aakash Kumar Agarwal (2025). cifar100_Vit_large [Dataset]. https://huggingface.co/datasets/aaaakash001/cifar100_Vit_large
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2025
Authors
Aakash Kumar Agarwal
Description
This dataset is original cifar10 dataset but also contains features from vit_large.

Import packages

from transformers import AutoImageProcessor, AutoModelForImageClassification import torch from torch.utils.data import DataLoader from datasets import load_dataset

model

processor = AutoImageProcessor.from_pretrained("google/vit-large-patch32-384",use_fast=True) model = AutoModelForImageClassification.from_pretrained("google/vit-large-patch32-384")… See the full description on the dataset page: https://huggingface.co/datasets/aaaakash001/cifar100_Vit_large.
h
multimodal_sarcasm_detection
huggingface.co
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
方子东 (2023). multimodal_sarcasm_detection [Dataset]. https://huggingface.co/datasets/quaeast/multimodal_sarcasm_detection
Explore at:
Dataset updated
Nov 3, 2023
Authors
方子东
Description
copy of data-of-multimodal-sarcasm-detection

usage

from datasets import load_dataset from transformers import CLIPImageProcessor, CLIPTokenizer from torch.utils.data import DataLoader

image_processor = CLIPImageProcessor.from_pretrained(clip_path) tokenizer = CLIPTokenizer.from_pretrained(clip_path)

def tokenization(example): text_inputs = tokenizer(example["text"], truncation=True, padding=True, return_tensors="pt") image_inputs = image_processor(example["image"], return_tensors="pt")… See the full description on the dataset page: https://huggingface.co/datasets/quaeast/multimodal_sarcasm_detection.
h
MMSD2.0
huggingface.co
Updated May 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junjie Chen (2024). MMSD2.0 [Dataset]. http://doi.org/10.57967/hf/4648
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/4648
Dataset updated
May 1, 2024
Authors
Junjie Chen
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

This is a copy of the dataset uploaded on Hugging Face for easy access. The original data comes from this work, which is an improvement upon a previous study.

Usage

from typing import TypedDict, cast

import pytorch_lightning as pl from datasets import Dataset, load_dataset from torch import Tensor from torch.utils.data import DataLoader from transformers import CLIPProcessor

class… See the full description on the dataset page: https://huggingface.co/datasets/coderchen01/MMSD2.0.
h
malaysian-youtube
huggingface.co
Updated Jan 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Malaysia AI (2024). malaysian-youtube [Dataset]. https://huggingface.co/datasets/malaysia-ai/malaysian-youtube
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 5, 2024
Dataset authored and provided by
Malaysia AI
Area covered
Malaysia, YouTube
Description
Malaysian Youtube

Malaysian and Singaporean youtube channels, total up to 60k audio files with total 18.7k hours. URLs data at https://github.com/mesolitica/malaya-speech/tree/master/data/youtube/data Notebooks at https://github.com/mesolitica/malaya-speech/tree/master/data/youtube

How to load the data efficiently?

import pandas as pd import json from datasets import Audio from torch.utils.data import DataLoader, Dataset

chunks = 30 sr = 16000

class Train(Dataset):… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/malaysian-youtube.
OGBG-MolSIDER (Processed for PyG)
kaggle.com
zip
Updated Feb 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redao da Taupl (2021). OGBG-MolSIDER (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbg-molsider
Explore at:
zip(493502 bytes)Available download formats
Dataset updated
Feb 27, 2021
Authors
Redao da Taupl
Description
OGBN-MolSIDER

Webpage: https://ogb.stanford.edu/docs/graphprop/#ogbg-mol

Usage in Python

import os import os.path as osp import pandas as pd import torch from ogb.graphproppred import PygGraphPropPredDataset class PygOgbgMol(PygGraphPropPredDataset): def _init_(self, name, transform = None, pre_transform = None, meta_csv = None): root = '../input' if meta_csv is None: meta_csv = osp.join(root, name, 'ogbg-master.csv') master = pd.read_csv(meta_csv, index_col = 0) meta_dict = master[name] meta_dict['dir_path'] = osp.join(root, name) super()._init_(name = name, root = root, transform = transform, pre_transform = pre_transform, meta_dict = meta_dict) def get_idx_split(self, split_type = None): if split_type is None: split_type = self.meta_info['split'] path = osp.join(self.root, 'split', split_type) # short-cut if split_dict.pt exists if os.path.isfile(os.path.join(path, 'split_dict.pt')): return torch.load(os.path.join(path, 'split_dict.pt')) train_idx = pd.read_csv(osp.join(path, 'train.csv'), header = None).values.T[0] valid_idx = pd.read_csv(osp.join(path, 'valid.csv'), header = None).values.T[0] test_idx = pd.read_csv(osp.join(path, 'test.csv'), header = None).values.T[0] return {'train': torch.tensor(train_idx, dtype = torch.long), 'valid': torch.tensor(valid_idx, dtype = torch.long), 'test': torch.tensor(test_idx, dtype = torch.long)} dataset = PygOgbgMol('ogbg-molsider') from torch_geometric.data import DataLoader batch_size = 32 split_idx = dataset.get_idx_split() train_loader = DataLoader(dataset[split_idx['train']], batch_size = batch_size, shuffle = True) valid_loader = DataLoader(dataset[split_idx['valid']], batch_size = batch_size, shuffle = False) test_loader = DataLoader(dataset[split_idx['test']], batch_size = batch_size, shuffle = False)

Description

Graph: The ogbg-molhiv and ogbg-molpcba datasets are two molecular property prediction datasets of different sizes: ogbg-molhiv (small) and ogbg-molpcba (medium). They are adopted from the MoleculeNet [1], and are among the largest of the MoleculeNet datasets. All the molecules are pre-processed using RDKit [2]. Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds. Input node features are 9-dimensional, containing atomic number and chirality, as well as other additional atom features such as formal charge and whether the atom is in the ring or not. The full description of the features is provided in code. The script to convert the SMILES string [3] to the above graph object can be found here. Note that the script requires RDKit to be installed. The script can be used to pre-process external molecule datasets so that those datasets share the same input feature space as the OGB molecule datasets. This is particularly useful for pre-training graph models, which has great potential to significantly increase generalization performance on the (downstream) OGB datasets [4].

Beside the two main datasets, the dataset authors additionally provide 10 smaller datasets from MoleculeNet. They are ogbg-moltox21, ogbg-molbace, ogbg-molbbbp, ogbg-molclintox, ogbg-molmuv, ogbg-molsider, and ogbg-moltoxcast for (multi-task) binary classification, and ogbg-molesol, ogbg-molfreesolv, and ogbg-mollipo for regression. Evaluators are also provided for these datasets. These datasets can be used to stress-test molecule-specific methods or transfer learning [4].

For encoding these raw input features, the dataset authors prepare simple modules called AtomEncoder and BondEncoder. They can be used as follows to embed raw atom and bond features to obtain atom_emb and bond_emb.

from ogb.graphproppred.mol_encoder import AtomEncoder, BondEncoder atom_encoder = AtomEncoder(emb_dim = 100) bond_encoder = BondEncoder(emb_dim = 100) atom_emb = atom_encoder(x) # x is the input atom feature edge_emb = bond_encoder(edge_attr) # edge_attr is the input edge feature

Prediction task: The task is to predict the target molecular properties as accurately as possible, where the molecular properties are cast as binary labels, e.g, whether a molecule inhibits HIV virus replication or not. Note that some datasets (e.g., ogbg-molpcba) can have multiple tasks, and can contain nan that indicates the corresponding label is not assigned to the molecule. For evaluation metric, the dataset authors closely follow [2]. Specifically, for ogbg-molhiv, the dataset authors use ROC-AUC...
OGBG-MolBBBP (Processed for PyG)
kaggle.com
zip
Updated Feb 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Redao da Taupl (2021). OGBG-MolBBBP (Processed for PyG) [Dataset]. https://www.kaggle.com/dataup1/ogbg-molbbbp
Explore at:
zip(471366 bytes)Available download formats
Dataset updated
Feb 27, 2021
Authors
Redao da Taupl
Description
OGBN-MolBBBP

Webpage: https://ogb.stanford.edu/docs/graphprop/#ogbg-mol

Usage in Python

import os import os.path as osp import pandas as pd import torch from ogb.graphproppred import PygGraphPropPredDataset class PygOgbgMol(PygGraphPropPredDataset): def _init_(self, name, transform = None, pre_transform = None, meta_csv = None): root = '../input' if meta_csv is None: meta_csv = osp.join(root, name, 'ogbg-master.csv') master = pd.read_csv(meta_csv, index_col = 0) meta_dict = master[name] meta_dict['dir_path'] = osp.join(root, name) super()._init_(name = name, root = root, transform = transform, pre_transform = pre_transform, meta_dict = meta_dict) def get_idx_split(self, split_type = None): if split_type is None: split_type = self.meta_info['split'] path = osp.join(self.root, 'split', split_type) # short-cut if split_dict.pt exists if os.path.isfile(os.path.join(path, 'split_dict.pt')): return torch.load(os.path.join(path, 'split_dict.pt')) train_idx = pd.read_csv(osp.join(path, 'train.csv'), header = None).values.T[0] valid_idx = pd.read_csv(osp.join(path, 'valid.csv'), header = None).values.T[0] test_idx = pd.read_csv(osp.join(path, 'test.csv'), header = None).values.T[0] return {'train': torch.tensor(train_idx, dtype = torch.long), 'valid': torch.tensor(valid_idx, dtype = torch.long), 'test': torch.tensor(test_idx, dtype = torch.long)} dataset = PygOgbgMol('ogbg-molbbbp') from torch_geometric.data import DataLoader batch_size = 32 split_idx = dataset.get_idx_split() train_loader = DataLoader(dataset[split_idx['train']], batch_size = batch_size, shuffle = True) valid_loader = DataLoader(dataset[split_idx['valid']], batch_size = batch_size, shuffle = False) test_loader = DataLoader(dataset[split_idx['test']], batch_size = batch_size, shuffle = False)

Description

Graph: The ogbg-molhiv and ogbg-molpcba datasets are two molecular property prediction datasets of different sizes: ogbg-molhiv (small) and ogbg-molpcba (medium). They are adopted from the MoleculeNet [1], and are among the largest of the MoleculeNet datasets. All the molecules are pre-processed using RDKit [2]. Each graph represents a molecule, where nodes are atoms, and edges are chemical bonds. Input node features are 9-dimensional, containing atomic number and chirality, as well as other additional atom features such as formal charge and whether the atom is in the ring or not. The full description of the features is provided in code. The script to convert the SMILES string [3] to the above graph object can be found here. Note that the script requires RDKit to be installed. The script can be used to pre-process external molecule datasets so that those datasets share the same input feature space as the OGB molecule datasets. This is particularly useful for pre-training graph models, which has great potential to significantly increase generalization performance on the (downstream) OGB datasets [4].

Beside the two main datasets, the dataset authors additionally provide 10 smaller datasets from MoleculeNet. They are ogbg-moltox21, ogbg-molbace, ogbg-molbbbp, ogbg-molclintox, ogbg-molmuv, ogbg-molsider, and ogbg-moltoxcast for (multi-task) binary classification, and ogbg-molesol, ogbg-molfreesolv, and ogbg-mollipo for regression. Evaluators are also provided for these datasets. These datasets can be used to stress-test molecule-specific methods or transfer learning [4].

For encoding these raw input features, the dataset authors prepare simple modules called AtomEncoder and BondEncoder. They can be used as follows to embed raw atom and bond features to obtain atom_emb and bond_emb.

from ogb.graphproppred.mol_encoder import AtomEncoder, BondEncoder atom_encoder = AtomEncoder(emb_dim = 100) bond_encoder = BondEncoder(emb_dim = 100) atom_emb = atom_encoder(x) # x is the input atom feature edge_emb = bond_encoder(edge_attr) # edge_attr is the input edge feature

Prediction task: The task is to predict the target molecular properties as accurately as possible, where the molecular properties are cast as binary labels, e.g, whether a molecule inhibits HIV virus replication or not. Note that some datasets (e.g., ogbg-molpcba) can have multiple tasks, and can contain nan that indicates the corresponding label is not assigned to the molecule. For evaluation metric, the dataset authors closely follow [2]. Specifically, for ogbg-molhiv, the dataset authors use ROC-AUC f...

PyTorch

kaggle.com

zip

Updated Oct 22, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Mirza Milan Farabi (2024). PyTorch [Dataset]. https://www.kaggle.com/datasets/mirzamilanfarabi/pytorch

Explore at:

zip(123861801 bytes)Available download formats

Dataset updated

Oct 22, 2024

Authors

Mirza Milan Farabi

Description

https://github.com/pytorch/pytorch/raw/main/docs/source/_static/img/pytorch-logo-dark.png" alt="PyTorch Logo">

PyTorch is a Python package that provides two high-level features: - Tensor computation (like NumPy) with strong GPU acceleration - Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

More About PyTorch

Learn the basics of PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component	Description
torch	A Tensor library like NumPy, with strong GPU support
torch.autograd	A tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit	A compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn	A neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing	Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils	DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

A replacement for NumPy to use the power of GPUs.
A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

Tensor illustration

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, mathematical operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy resear...

Data from: Duck Hunt

kaggle.com

zip

Updated Jul 26, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Hugo Zanini (2025). Duck Hunt [Dataset]. https://www.kaggle.com/datasets/hugozanini1/duck-hunt

Explore at:

zip(7379197 bytes)Available download formats

Dataset updated

Jul 26, 2025

Authors

Hugo Zanini

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Duck Hunt Object Detection Dataset

This dataset contains 1,004 labeled images from the classic NES game "Duck Hunt" (1984), specifically prepared for YOLO (You Only Look Once) object detection training. The dataset includes sprites of the iconic hunting dog and ducks in various states, augmented to provide a balanced and comprehensive training set for computer vision models.

Perfect for: - Object detection model training - Computer vision research - Retro gaming AI projects - YOLO algorithm benchmarking - Educational purposes

🎯 Dataset Statistics

Metric	Value
Total Images	1,004
Dataset Size	12 MB
Image Format	PNG
Annotation Format	YOLO (.txt)
Classes	4
Train/Val Split	711/260 (73%/27%)

Class Distribution

Class ID	Class Name	Count	Description
0	`dog`	252	The hunting dog in various poses (jumping, laughing, sniffing, etc.)
1	`duck_dead`	256	Dead ducks (both black and red variants)
2	`duck_shot`	248	Ducks in the moment of being shot
3	`duck_flying`	248	Flying ducks in all directions (left, right, diagonal)

📁 Dataset Structure

yolo_dataset_augmented/
├── images/
│  ├── train/      # 711 training images
│  └── val/       # 260 validation images
├── labels/
│  ├── train/      # 711 YOLO annotation files
│  └── val/       # 260 YOLO annotation files
├── classes.txt     # Class names mapping
├── dataset.yaml     # YOLO configuration file
└── augmented_dataset_stats.json # Detailed statistics

🔧 Data Augmentation Details

The original 47 images were enhanced using advanced data augmentation techniques to create a balanced dataset:

Augmentation Techniques Applied:

Geometric Transformations: Rotation (±15°), horizontal/vertical flipping, scaling (0.8-1.2x), translation
Color Adjustments: Brightness (0.7-1.3x), contrast (0.8-1.2x), saturation (0.8-1.2x)
Quality Variations: Gaussian noise, slight blur for robustness
Advanced Techniques: Mosaic augmentation (YOLO-style 4-image combination)

Augmentation Parameters:

{
  'rotation_range': (-15, 15),    # Small rotations for game sprites
  'brightness_range': (0.7, 1.3),  # Brightness variations
  'contrast_range': (0.8, 1.2),   # Contrast adjustments
  'saturation_range': (0.8, 1.2),  # Color saturation
  'noise_intensity': 0.02,      # Gaussian noise
  'horizontal_flip_prob': 0.5,    # 50% chance horizontal flip
  'scaling_range': (0.8, 1.2),    # Scale variations
}

🚀 Usage Examples

Loading with YOLOv8 (Ultralytics)

from ultralytics import YOLO

# Load and train
model = YOLO('yolov8n.pt') # Load pretrained model
results = model.train(data='dataset.yaml', epochs=100, imgsz=640)

# Validate
metrics = model.val()

# Predict
results = model('path/to/test/image.png')

Loading with PyTorch

import torch
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import os

class DuckHuntDataset(Dataset):
  def _init_(self, images_dir, labels_dir, transform=None):
    self.images_dir = images_dir
    self.labels_dir = labels_dir
    self.transform = transform
    self.images = os.listdir(images_dir)
  
  def _len_(self):
    return len(self.images)
  
  def _getitem_(self, idx):
    img_path = os.path.join(self.images_dir, self.images[idx])
    label_path = os.path.join(self.labels_dir, 
                 self.images[idx].replace('.png', '.txt'))
    
    image = Image.open(img_path)
    # Load YOLO annotations
    with open(label_path, 'r') as f:
      labels = f.readlines()
    
    if self.transform:
      image = self.transform(image)
      
    return image, labels

# Usage
dataset = DuckHuntDataset('images/train', 'labels/train')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

YOLO Annotation Format

Each .txt file contains one line per object: class_id center_x center_y width height

Example annotation: 0 0.492 0.403 0.212 0.315 Where values are normalized (0-1) relative to image dimensions.

📊 Technical Specifications

Image Dimensions: Variable (original sprite sizes preserved)
Color Channels: RGB (3 channels)
Annotation Precision: Float32 (normalized coordinates)
File Naming: Descriptive names indicating class and augmentation type
Quality: High-resolution pixel art sprites

🎮 Dataset Context

This dataset is based on sprites from the iconic 1984 NES game "Duck Hunt," one of the most recognizable video games in history. The game featured:

The Dog: Your hunting companion who retrieves ducks and ...

MELD Preprocessed
kaggle.com
zip
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Argish Abhangi (2025). MELD Preprocessed [Dataset]. https://www.kaggle.com/datasets/argish/meld-preprocessed
Explore at:
zip(3527202381 bytes)Available download formats
Dataset updated
Mar 1, 2025
Authors
Argish Abhangi
Description
The MELD Preprocessed Dataset is a multi-modal dataset designed for research on emotion recognition from audio, video, and textual data. The dataset builds upon the original MELD dataset and applies extensive preprocessing steps to extract features from different modalities. Each sample is saved as a .pt file containing a dictionary of preprocessed features, making it easy for developers to load and integrate into PyTorch-based workflows.

Data Sources

Audio: Waveforms extracted from the original video files.

Video: Video files are processed to sample frames at a target frame rate (default: 2 fps) and to detect faces using a Haar Cascade classifier.

Text: Utterances from the dialogue, which are cleaned using custom encoding functions to fix potential byte encoding issues.

Emotion Labels: Each sample is associated with an emotion label.

Preprocessing Pipeline

The preprocessing script performs several key steps:

Text Cleaning:

fix_encoding_with_bytes(text): Decodes text from bytes using UTF-8, Latin-1, or cp1252, ensuring correct encoding.

replace_double_encoding(text): Fixes issues related to double-encoded characters (e.g., replacing "Â’" with the proper apostrophe).

Audio Processing:

Extracts raw audio waveform from each sample.

Computes a Mel-spectrogram using torchaudio.transforms.MelSpectrogram with 64 mel bins (VGGish format).

Converts the spectrogram to a logarithmic scale for numerical stability.

Video Processing:

Reads video frames at a specified target FPS (default: 2 fps) using OpenCV.

For each video, samples frames evenly based on the original video's FPS.

Applies Haar Cascade face detection on the frames to extract the first detected face.

Resizes the detected face to 224x224 and converts it to RGB. If no face is detected, a default black image (224x224x3) is returned.

Saving Processed Samples:

Each sample is saved as a .pt file in a directory structure split by data type (train, dev, and test).

The filename is derived from the original video filename (e.g., dia0_utt1.mp4 becomes dia0_utt1.pt).

Data Format

Each preprocessed sample is stored in a .pt file and contains a dictionary with the following keys:

utterance (str): The cleaned textual utterance.

emotion (str/int): The corresponding emotion label.

video_path (str): Original path to the video file from which the sample was extracted.

audio (Tensor): Raw audio waveform tensor of shape [channels, time].

audio_sample_rate (int): The sampling rate of the audio waveform.

audio_mel (Tensor): The computed log-scaled Mel-spectrogram with shape [channels, n_mels, time].

face (NumPy array): The extracted face image (RGB format) of shape (224, 224, 3). If no face was detected, a default black image is provided.

Directory Structure

The preprocessed files are organized into splits: preprocessed_data/ ├── train/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... ├── dev/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt │ └── ... └── test/ │ ├── dia0_utt0.pt │ ├── dia1_utt1.pt └── ...

Loading and Using the Dataset

A custom PyTorch dataset and DataLoader are provided to facilitate easy integration:

Dataset Class

from torch.utils.data import Dataset import os import torch class PreprocessedMELDDataset(Dataset): def _init_(self, data_dir): """ Args: data_dir (str): Directory where preprocessed .pt files are stored. """ self.data_dir = data_dir self.files = [os.path.join(data_dir, f) for f in os.listdir(data_dir) if f.endswith('.pt')] def _len_(self): return len(self.files) def _getitem_(self, idx): sample_path = self.files[idx] sample = torch.load(sample_path) return sample

Custom Collate Function

def preprocessed_collate_fn(batch): """ Collates a list of sample dictionaries into a single dictionary with keys mapping to lists. Modify this function to pad or stack tensor data if needed. """ collated = {} collated['utterance'] = [sample['utterance'] for sample in batch] collated['emotion'] = [sample['emotion'] for sample in batch] collated['video_path'] = [sample['video_path'] for sample in batch] collated['audio'] = [sample['audio'] for sample in batch] collated['audio_sample_rate'] = batch[0]['audio_sample_rate'] collated['audio_mel'] = [sample['audio_mel'] for sample in batch] collated['face'] = [sample['face'] for sample in batch] return collated

Creating DataLoaders

from torch.utils.data import DataLoader # Define paths for each split train_data_dir = "preprocessed_data/train" dev_data_dir = "preproces...
feral-cat-segmentation_dataset
kaggle.com
universe.roboflow.com
zip
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lu hou yang (2025). feral-cat-segmentation_dataset [Dataset]. https://www.kaggle.com/datasets/luhouyang/feral-cat-segmentation-dataset
Explore at:
zip(971125684 bytes)Available download formats
Dataset updated
Mar 18, 2025
Authors
lu hou yang
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Feral Cat Segmentation Dataset

Overview

This dataset provides image segmentation data for feral cats, designed for computer vision and machine learning tasks. It builds upon the original public domain dataset by Paul Cashman from Roboflow, with additional preprocessing and multiple data formats for easier consumption.

Dataset Source

Original Author: Paul Cashman

Original Source: Roboflow Universe

Extended by: Lu Hou Yang

GitHub: https://github.com/luhouyang/open_circles

License: Public Domain

Dataset Contents

The dataset is organized into three standard splits: - Train set - Validation set - Test set

Each split contains data in multiple formats: 1. Original JPG images 2. Segmentation mask JPG images 3. Parquet files containing flattened image and mask data 4. Pickle files containing serialized image and mask data

Data Formats

1. Image Files

Format: JPG

Resolution: 224×224 pixels

Directory Structure:

train/: Original training images

valid/: Original validation images

test/: Original test images

train_mask/: Corresponding segmentation masks for training

valid_mask/: Corresponding segmentation masks for validation

test_mask/: Corresponding segmentation masks for testing

2. Parquet Files

Files: train_dataset.parquet, valid_dataset.parquet, test_dataset.parquet

Content: Flattened image data and corresponding masks combined in a single table

Structure: Each row contains the flattened pixel values of an image followed by the flattened pixel values of its mask

Data Division: Image and mask data are split at index split_at = image_size[0] * image_size[1] * image_channels

Data before this index: image pixel values (reshaped to [-1, 224, 224, 3])

Data after this index: mask pixel values (reshaped to [-1, 224, 224, 1])

Benefits: Efficient storage and faster loading compared to individual image files

3. Pickle Files

Files: train_dataset.pkl, valid_dataset.pkl, test_dataset.pkl

Content: Serialized Python objects containing images and their corresponding masks

Structure: List of [image, mask] pairs, where each image and mask is serialized using Python's pickle

Data Access: Similar to parquet files, when loaded through the provided dataset class, data is split at the same index: split_at = image_size[0] * image_size[1] * image_channels

Benefits: Preserves original data structure and enables quick loading in Python

4. CSV Files

Files: train_dataset.csv, valid_dataset.csv, test_dataset.csv

Content: Same data as parquet files but in CSV format

Structure: No headers, raw flattened pixel values

Data Division: Same split point as parquet files

Image Preprocessing

All images were preprocessed with the following operations: - Resized to 224×224 pixels using bilinear interpolation - Segmentation masks were also resized to match the images using nearest neighbor interpolation - Original RLE (Run-Length Encoding) segmentation data converted to binary masks

Data Normalization

When used with the provided PyTorch dataset class, images are normalized with: - Mean: [0.48235, 0.45882, 0.40784] - Standard Deviation: [0.00392156862745098, 0.00392156862745098, 0.00392156862745098]

PyTorch Integration

A custom CatDataset class is included for easy integration with PyTorch:

from cat_dataset import CatDataset # Load from parquet format dataset = CatDataset( root="path/to/dataset", split="train", # Options: "train", "valid", "test" format="parquet", # Options: "parquet", "pkl" image_size=[224, 224], image_channels=3, mask_channels=1 ) # Use with PyTorch DataLoader from torch.utils.data import DataLoader dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

Performance Comparison

Loading time benchmarks from the original implementation: - Parquet format: ~1.29 seconds per iteration - Pickle format: ~0.71 seconds per iteration

The pickle format provides the fastest loading times and is recommended for most use cases.

Citation

If you use this dataset in your research or projects, please cite:

@misc{feral-cat-segmentation_dataset, title = {feral-cat-segmentation Dataset}, type = {Open Source Dataset}, author = {Paul Cashman}, howpublished = {\url{https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}}, url = {https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}, journal = {Roboflow Universe}, publisher = {Roboflow}, year = {2025}, month = {mar}, note = {visited on 2025-03-19}, }

Sample Usage Code

Basic Dataset Loading

from ca...
heptapod_dataset
kaggle.com
zip
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matheus Latorre Cavini (2025). heptapod_dataset [Dataset]. https://www.kaggle.com/datasets/matheuslatorrecavini/heptapod-dataset/discussion
Explore at:
zip(10262538 bytes)Available download formats
Dataset updated
Jun 28, 2025
Authors
Matheus Latorre Cavini
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset consists of 4900 images of logograms from Heptapod B language, in resolution 224x224, and the captions for their meaning in English. There are 49 unique logograms and 100 variations (rotation, scaling, translation) for each of them.

Original source of the data: Wolfram Research GitHub Repository. Distributed under Creative Commons Attribution-NonCommercial 4.0 International License.

The dataset was augmented by merging morphems of the logograms and by applying geometric transformations to create variations of each image.

The captions.txt file provide captions for each unique logogram, and can interpreted as:

000.png | Abbot is dead is the caption for images 0000.png to 0099.png

001.png | Abbot is the caption for images 0100.png to 0199.png

002.png | Abbot chooses save humanityis the caption for images 0200.png to 0299.png

And so on

Suggested loading for PyTorch:

from PIL import Image import torch from torch.utils.data import Dataset, DataLoader from torchvision import transforms import os class TextToImageDataset(Dataset): def _init_(self, image_dir, captions_file, transform=None): self.image_dir = image_dir # Path for the images on the dataset self.transform = transform self.pairs = [] # Array to store (image, sentence) pairs with open(captions_file, "r") as f: for line in f: idx, caption = line.strip().split("|") idx = idx.strip().split(".")[0] caption = caption.strip() for i in range(100): img_file = f"{(int(idx)*100 + i):04d}.png" # Get the image number by doing idx*100 + i self.pairs.append((caption, img_file)) # Apply the same caption for every variation of the same logogram def _len_(self): return len(self.pairs) def _getitem_(self, idx): text, img_file = self.pairs[idx] image = Image.open(os.path.join(self.image_dir, img_file)).convert("RGB") if self.transform: image = self.transform(image) return text, image #item = (text, image)

transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() ]) base_dir = "/kaggle/input/heptapod-dataset/dataset/" dataset = TextToImageDataset(image_dir=base_dir+"images",captions_file=base_dir+"captions.txt", transform=transform)
Star Wars Chat Bot
kaggle.com
zip
Updated Dec 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Star Wars Chat Bot [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/star-wars-chat-bot/discussion
Explore at:
zip(3138 bytes)Available download formats
Dataset updated
Dec 8, 2021
Authors
Aslan Ahmedov
Description
Star-Wars-Chatbot

Simple chatbot implementation with PyTorch. A chatbot made in Python that features various data about the Star Wars universe. This is a generic chatbot. Can be trained on pretty much any conversation as long as formatted correctly JSON file. I used it for a final project in Artificial Intelligence. To use just run the script training first, then run your chatbot. For more please have a look on GitHub

Introduction

Chatbots are extremely helpful for business organizations and also the customers. The majority of people prefer to talk directly from a chatbox instead of calling service centers. Today I am going to build an exciting project on Chatbot. I will implement a chatbot from scratch that will be able to understand what the user is talking about and give an appropriate response. Chatbots are nothing but an intelligent piece of software that can interact and communicate with people just like humans. Here in this project we created an AI Chatbot which is focused for The Star Wars Cinematic Universe and trying training it in such a way that it can answer some of the basics queries about Star Wars.

Explanation Of Chatbot

Chatbots are basically AI intelligence bots which can interact with the user or customers depends upon the usage. It is an application of Artificial Intelligence and Machine Learning¬. Now-a-days technology is increasing rapidly. In this technological world every industry is trying to automate things to provide better services. One of the great application of automation would be chatbot.

There are basically two types of Chatbots :

Command based: Chatbots that function on predefined rules and can answer to only limited queries or questions. Users need to select an option to determine their next step.

Intelligent/AI Chatbots: Chatbots that leverage Machine Learning and Natural Language Understanding to understand the user’s language and are intelligent enough to learn from conversations with their users. You can converse via text, speech or even interact with a chatbot using graphical interfaces.

All chatbots come under the NLP (Natural Language Processing) concepts. NLP is composed of two things: - NLU (Natural Language Understanding): The ability of machines to understand human language like English. - NLG (Natural Language Generation): The ability of a machine to generate text similar to human written sentences Imagine a user asking a question to a chatbot: “Hey, what’s on the news today?” The chatbot will break down the user sentence into two things: intent and an entity. The intent for this sentence could be get_news as it refers to an action the user wants to perform. The entity tells specific details about the intent, so "today" will be the entity. So this way, a machine learning model is used to recognize the intents and entities of the chat.

Strategy

Import Libraries and Load the Data

Preprocessing the Data

Create Training and Testing Data

Training the Model

Graphical user interface

Import Libraries and Load the Data

I created a new python file and name it as chatbot.py and then import all the required modules. After that I loaded starwarsintents.json data file in our Python program.

import numpy as np import nltk from nltk.stem.porter import PorterStemmer stemmer = PorterStemmer() import torch import torch.nn as nn import random import json from torch.utils.data import Dataset, DataLoader from tkinter import * with open("starwarsintents.json", "r") as f: intents = json.load(f) ``` ## Preprocessing the Data - Creating Custom Functions: We will create custom Functions so that it is easy for us to implement afterwards. Natural language (nltk) took kit is a really useful library that contains important classes that will be useful in any of your NLP task. To know a bit more about Natural language (nltk). Please click [here](https://machinelearningmastery.com/natural-language-processing/) for more information. - Stemming: If we have 3 words like “walk”, “walked”, “walking”, these might seem different words but they generally have the same meaning and also have the same base form; “walk”. So, in order for our model to understand all different form of the same words we need to train our model with that form. This is called Stemming. There are different methods that we can use for stemming. Here we will use Porter Stemmer model form our NLTK Library. For more information click [here](http://snowball.tartarus.org/algorithms/porter/stemmer.html). - Bag of Words: We will be splitting each word in the sentences and adding it to an array. We will be using bag of words. Which will initially be a list of zeros with the size equal to the length of the all words array.If we have a array of sentences = ["hello", "how", "are", "you"] and an array of total words = ["hi", "hel...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Tobia Ippolito (2024). 3xM 10 10 (RGB-D Instance Seg. for bin-picking) [Dataset]. https://www.kaggle.com/datasets/tobiaippolito/3xm-10-10

3xM 10 10 (RGB-D Instance Seg. for bin-picking)

Syntethic Model-Material-Mixture Dataset for Bin-Picking Instance-Segmentation

Explore at:

zip(67215581908 bytes)Available download formats

Dataset updated

Nov 12, 2024

Authors

Tobia Ippolito

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

In short

This dataset used to investigate the influence of the unique amount of 3D-Models (Shapes) and Materials (Textures) towards the shape-textures bias, performance and generalization of deep neural network instance segmentation in my bachelor exam.

one of nine datasets created in Unreal Engine 5 with an NVIDIA RTX A4500
It uses 160 unique shapes and 80 unique textures
RGB, depth and solution masks are available
20.000 Scenes
Ready to use Dataloader, training and inference -> see next section

Usage

You can load the images like:

import cv2

image = cv2.imread(img_path)
if image is None:
  raise FileNotFoundError(f"Error during data loading: there is no '{img_path}'")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
if len(depth.shape) > 2:
  _, depth, _, _ = cv2.split(depth)
      
mask = cv2.imread(mask_path, cv2.IMREAD_UNCHANGED)  # cv2.IMREAD_GRAYSCALE)

For easy use I recommend to use my own code. You can directly use it to train Mask R-CNN or just use the dataloader. Both are shown now:

First: Clone my torch github project into your project terminal cd ./path/to/your/project git clone https://github.com/xXAI-botXx/torch-mask-rcnn-instance-segmentation.git Second: Install the anaconda env (optional) terminal cd ./path/to/your/project cd ./torch-mask-rcnn-instance-segmentation conda env create -f conda_env.yml Third: You are ready to use

Using only the dataloader for your custom project: ```python import os import numpy as np import matplotlib.pyplot as plt import cv2 from torch.utils.data import DataLoader

import sys sys.path.append("./torch-mask-rcnn-instance-segmentation")

from maskrcnn_toolkit import DATA_LOADING_MODE, Dual_Dir_Dataset, collate_fn, extract_and_visualize_mask

data_mode = DATA_LOADING_MODE.ALL

dataset = Dual_Dir_Dataset(img_dir="/path/to/rgb-folder", depth_dir="/path/to/depth-folder", mask_dir="/path/to/mask-folder", transform=None, amount=1, start_idx=0, end_idx=0, image_name="...", data_mode=data_mode, use_mask=True, use_depth=False, log_path="./logs", width=1920, height=1080, should_log=True, should_print=True, should_verify=False) data_loader = DataLoader(dataset, batch_size=5, shuffle=True, num_workers=4, collate_fn=collate_fn)

plot

for data in data_loader: for batch_idx in range(len(data[0])): if len(data) == 3: image = data[0][batch_idx].cpu().unsqueeze(0) masks = data[1][batch_idx]["masks"] masks = masks.cpu() name = data[2][batch_idx] else: image = data[0][batch_idx].cpu().unsqueeze(0) name = data[1][batch_idx]

  image = image.cpu().numpy().squeeze(0)
  image = np.transpose(image, (1, 2, 0)) # Convert to HWC

  # Remove 4.th channel if existing
  if image.shape[2] == 4:
    depth = image[:, :, 3]
    image = image[:, :, :3]
  else:
    depth = None

  masks_gt = masks.cpu().numpy()
  masks_gt = np.transpose(masks_gt, (1, 2, 0))
  mask = extract_and_visualize_mask(masks_gt, image=None, ax=None, visualize=False, color_map=None, soft_join=False)

  # plot
  cols = 1
  if depth is not None:
    cols += 1
  if mask is not None:
    cols += 1

  fig, ax = plt.subplots(nrows=1, ncols=cols, figsize=(20, 15*cols))
  fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.05, hspace=0.05)

  plot_idx = 0
  ax[plot_idx].imshow(image)
  ax[plot_idx].set_title("RGB Input Image")
  ax[plot_idx].axis("off")

  if depth is not None:
    plot_idx += 1
    ax[plot_idx].imshow(depth, cmap="gray")
    ax[plot_idx].set_title("Depth Input Image")
    ax[plot_idx].axis("off")

  if mask is not None:
    plot_idx += 1
    ax[plot_idx].imshow(mask)
    ax[plot_idx].set_title("Mask Ground Truth")
    ax[plot_idx].axis("off")

  plt.show()


**Using the whole Mask R-CNN training pipeline:**
```python
import sys
sys.path.append("./torch-mask-rcnn-instance-segmentation")

from maskrcnn_toolkit import DATA_LOADING_MODE, train


# set the vars as you need

WEIGHTS_PATH = None   # Path to the model weights file
USE_DEPTH = False      # Whether to include depth information -> as rgb and depth on green channel
VERIFY_DATA = False     # True is recommended

GROUND_PATH = "D:/3xM"  
DATASET_NAME = "3xM_Dataset_10_10"
IMG_DIR = os.path.join(GR...

Clear search

Close search

Google apps

Main menu

3xM 10 10 (RGB-D Instance Seg. for bin-picking)

In short

Usage

plot

Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

code for testing robustness of a model

extract and unzip the dataset, then write top folder here

select folder with specific target

mmi-bendr-preprocessed

PDEBench_2D_DarcyFlow

Load the dataset… See the full description on the dataset page: https://huggingface.co/datasets/Nionio/PDEBench_2D_DarcyFlow.

SemEval_training_data_emotions

cifar100_Vit_large

multimodal_sarcasm_detection

usage

MMSD2.0

malaysian-youtube

OGBG-MolSIDER (Processed for PyG)

OGBN-MolSIDER

Usage in Python

Description

OGBG-MolBBBP (Processed for PyG)

OGBN-MolBBBP

Usage in Python

Description

PyTorch

More About PyTorch

A GPU-Ready Tensor Library

Dynamic Neural Networks: Tape-Based Autograd

Data from: Duck Hunt

Duck Hunt Object Detection Dataset

🎯 Dataset Statistics

Class Distribution

📁 Dataset Structure

🔧 Data Augmentation Details

Augmentation Techniques Applied:

Augmentation Parameters:

🚀 Usage Examples

Loading with YOLOv8 (Ultralytics)

Loading with PyTorch

YOLO Annotation Format

📊 Technical Specifications

🎮 Dataset Context

MELD Preprocessed

Data Sources

Preprocessing Pipeline

Data Format

Directory Structure

Loading and Using the Dataset

Dataset Class

Custom Collate Function

Creating DataLoaders

feral-cat-segmentation_dataset

Feral Cat Segmentation Dataset

Overview

Dataset Source

Dataset Contents

Data Formats

1. Image Files

2. Parquet Files

3. Pickle Files

4. CSV Files

Image Preprocessing

Data Normalization

PyTorch Integration

Performance Comparison

Citation

Sample Usage Code

Basic Dataset Loading

heptapod_dataset

Star Wars Chat Bot

Star-Wars-Chatbot

Introduction

Explanation Of Chatbot

There are basically two types of Chatbots :

Strategy

Import Libraries and Load the Data

3xM 10 10 (RGB-D Instance Seg. for bin-picking)See More Versions

Syntethic Model-Material-Mixture Dataset for Bin-Picking Instance-Segmentation

In short

3xM 10 10 (RGB-D Instance Seg. for bin-picking)