Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.
We release our dataset as a set of folders indicating the patch target label (e.g., banana
), each containing 1000 subfolders as the ImageNet output classes.
An example showing how to use the dataset is shown below.
import os.path
from torchvision import datasets, transforms, models import torch.utils.data
class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """
def find_classes(self, directory):
classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
if not classes:
raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
len(os.listdir(os.path.join(directory, cls_name))) > 0}
return classes, class_to_idx
dataset_folder = 'data/ImageNet-Patch'
available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }
target_label = 954
dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])
dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()
batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]
accuracy = correct / total attack_sr = attack_success / total
print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)
Dataset Card for "SemEval_traindata_emotions"
Как был получен from datasets import load_dataset import datasets from torchvision.io import read_video import json import torch import os from torch.utils.data import Dataset, DataLoader import tqdm
dataset_path = "./SemEval-2024_Task3/training_data/Subtask_2_train.json"
dataset = json.loads(open(dataset_path).read()) print(len(dataset))
all_conversations = []
for item in dataset: all_conversations.extend(item["conversation"])… See the full description on the dataset page: https://huggingface.co/datasets/dim/SemEval_training_data_emotions.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Short-MetaWorld Dataset
Overview
Short-MetaWorld is a curated dataset from Meta-World containing Multi-Task 10 (MT10) and Meta-Learning 10 (ML10) tasks with 100 successful trajectories per task and 20 steps per trajectory. This dataset is specifically designed for multi-task robot learning, imitation learning, and vision-language robotics research.
🚀 Quick Start
from short_metaworld_loader import load_short_metaworld from torch.utils.data import DataLoader
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
IndicVoices_bengali
This dataset has been created from ai4bharat/IndicVoices. Since directly trying to load the dataset for bengali was not working with IndicVoices due to errors in some files, this dataset addresses those files by removing them. To use this dataset with lazyloading for training speech to text models, below is sample code with wav2vec2. import pandas as pd import torchaudio from torch.utils.data import Dataset, DataLoader from transformers import Wav2Vec2Processor… See the full description on the dataset page: https://huggingface.co/datasets/subratasarkar32/IndicVoices_bengali.
Dataset summary
Dataset forked from pie/asbtrct. Here all sentences from AbstRCT dataset abstracts are grouped together with labels:
0: Premise 1: Claim 2: MajorClaim
import random import torch import numpy as np from datasets import load_dataset from transformers import set_seed, AutoTokenizer, DataCollatorWithPadding from torch.utils.data import DataLoader
def get_dataset(tokenizer, max_length=128): dataset = load_dataset("david-inf/am-nlp-abstrct")
def… See the full description on the dataset page: https://huggingface.co/datasets/david-inf/am-nlp-abstrct.
Malaysian Youtube
Malaysian and Singaporean youtube channels, total up to 60k audio files with total 18.7k hours. URLs data at https://github.com/mesolitica/malaya-speech/tree/master/data/youtube/data Notebooks at https://github.com/mesolitica/malaya-speech/tree/master/data/youtube
How to load the data efficiently?
import pandas as pd import json from datasets import Audio from torch.utils.data import DataLoader, Dataset
chunks = 30 sr = 16000
class Train(Dataset):… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/malaysian-youtube.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.
We release our dataset as a set of folders indicating the patch target label (e.g., banana
), each containing 1000 subfolders as the ImageNet output classes.
An example showing how to use the dataset is shown below.
import os.path
from torchvision import datasets, transforms, models import torch.utils.data
class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """
def find_classes(self, directory):
classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
if not classes:
raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
len(os.listdir(os.path.join(directory, cls_name))) > 0}
return classes, class_to_idx
dataset_folder = 'data/ImageNet-Patch'
available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }
target_label = 954
dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])
dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()
batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]
accuracy = correct / total attack_sr = attack_success / total
print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)