6 datasets found
  1. Z

    Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

    • data.niaid.nih.gov
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele Angioni (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    Daniele Angioni
    Battista Biggio
    Luca Demetrio
    Angelo Sotgiu
    Ambra Demontis
    Maura Pintor
    Fabio Roli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

    We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

    An example showing how to use the dataset is shown below.

    code for testing robustness of a model

    import os.path

    from torchvision import datasets, transforms, models import torch.utils.data

    class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

    def find_classes(self, directory):
      classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
      if not classes:
        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
      class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
              len(os.listdir(os.path.join(directory, cls_name))) > 0}
      return classes, class_to_idx
    

    extract and unzip the dataset, then write top folder here

    dataset_folder = 'data/ImageNet-Patch'

    available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

    select folder with specific target

    target_label = 954

    dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

    dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

    batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

    accuracy = correct / total attack_sr = attack_success / total

    print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)

  2. h

    SemEval_training_data_emotions

    • huggingface.co
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    web (2024). SemEval_training_data_emotions [Dataset]. https://huggingface.co/datasets/dim/SemEval_training_data_emotions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2024
    Authors
    web
    Description

    Dataset Card for "SemEval_traindata_emotions"

    Как был получен from datasets import load_dataset import datasets from torchvision.io import read_video import json import torch import os from torch.utils.data import Dataset, DataLoader import tqdm

    dataset_path = "./SemEval-2024_Task3/training_data/Subtask_2_train.json"

    dataset = json.loads(open(dataset_path).read()) print(len(dataset))

    all_conversations = []

    for item in dataset: all_conversations.extend(item["conversation"])… See the full description on the dataset page: https://huggingface.co/datasets/dim/SemEval_training_data_emotions.

  3. h

    short-metaworld-vla

    • huggingface.co
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H.Z (2025). short-metaworld-vla [Dataset]. https://huggingface.co/datasets/hz1919810/short-metaworld-vla
    Explore at:
    Dataset updated
    Jul 2, 2025
    Authors
    H.Z
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Short-MetaWorld Dataset

      Overview
    

    Short-MetaWorld is a curated dataset from Meta-World containing Multi-Task 10 (MT10) and Meta-Learning 10 (ML10) tasks with 100 successful trajectories per task and 20 steps per trajectory. This dataset is specifically designed for multi-task robot learning, imitation learning, and vision-language robotics research.

      🚀 Quick Start
    

    from short_metaworld_loader import load_short_metaworld from torch.utils.data import DataLoader

    … See the full description on the dataset page: https://huggingface.co/datasets/hz1919810/short-metaworld-vla.

  4. h

    IndicVoices_bengali

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subrata Sarkar, IndicVoices_bengali [Dataset]. https://huggingface.co/datasets/subratasarkar32/IndicVoices_bengali
    Explore at:
    Authors
    Subrata Sarkar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    IndicVoices_bengali

    This dataset has been created from ai4bharat/IndicVoices. Since directly trying to load the dataset for bengali was not working with IndicVoices due to errors in some files, this dataset addresses those files by removing them. To use this dataset with lazyloading for training speech to text models, below is sample code with wav2vec2. import pandas as pd import torchaudio from torch.utils.data import Dataset, DataLoader from transformers import Wav2Vec2Processor… See the full description on the dataset page: https://huggingface.co/datasets/subratasarkar32/IndicVoices_bengali.

  5. h

    am-nlp-abstrct

    • huggingface.co
    Updated May 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Nardi (2025). am-nlp-abstrct [Dataset]. https://huggingface.co/datasets/david-inf/am-nlp-abstrct
    Explore at:
    Dataset updated
    May 18, 2025
    Authors
    David Nardi
    Description

    Dataset summary

    Dataset forked from pie/asbtrct. Here all sentences from AbstRCT dataset abstracts are grouped together with labels:

    0: Premise 1: Claim 2: MajorClaim

    import random import torch import numpy as np from datasets import load_dataset from transformers import set_seed, AutoTokenizer, DataCollatorWithPadding from torch.utils.data import DataLoader

    def get_dataset(tokenizer, max_length=128): dataset = load_dataset("david-inf/am-nlp-abstrct")

    def… See the full description on the dataset page: https://huggingface.co/datasets/david-inf/am-nlp-abstrct.
    
  6. h

    malaysian-youtube

    • huggingface.co
    Updated Jan 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malaysia AI (2024). malaysian-youtube [Dataset]. https://huggingface.co/datasets/malaysia-ai/malaysian-youtube
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 5, 2024
    Dataset authored and provided by
    Malaysia AI
    Area covered
    Malaysia, YouTube
    Description

    Malaysian Youtube

    Malaysian and Singaporean youtube channels, total up to 60k audio files with total 18.7k hours. URLs data at https://github.com/mesolitica/malaya-speech/tree/master/data/youtube/data Notebooks at https://github.com/mesolitica/malaya-speech/tree/master/data/youtube

      How to load the data efficiently?
    

    import pandas as pd import json from datasets import Audio from torch.utils.data import DataLoader, Dataset

    chunks = 30 sr = 16000

    class Train(Dataset):… See the full description on the dataset page: https://huggingface.co/datasets/malaysia-ai/malaysian-youtube.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Daniele Angioni (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777

Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches

Related Article
Explore at:
Dataset updated
Jun 30, 2022
Dataset provided by
Daniele Angioni
Battista Biggio
Luca Demetrio
Angelo Sotgiu
Ambra Demontis
Maura Pintor
Fabio Roli
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

An example showing how to use the dataset is shown below.

code for testing robustness of a model

import os.path

from torchvision import datasets, transforms, models import torch.utils.data

class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

def find_classes(self, directory):
  classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
  if not classes:
    raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
  class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
          len(os.listdir(os.path.join(directory, cls_name))) > 0}
  return classes, class_to_idx

extract and unzip the dataset, then write top folder here

dataset_folder = 'data/ImageNet-Patch'

available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

select folder with specific target

target_label = 954

dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

accuracy = correct / total attack_sr = attack_success / total

print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)

Search
Clear search
Close search
Google apps
Main menu