87 datasets found
  1. Z

    Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

    • data.niaid.nih.gov
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086
    Explore at:
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Image Processing Group, Universitat Politècnica de Catalunya
    AI Lab Montreal, Samsung Advanced Institute of Technology
    AIML Lab, University of St.Gallen
    Authors
    Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  2. Pytorch Models

    • kaggle.com
    zip
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sufian Othman (2025). Pytorch Models [Dataset]. https://www.kaggle.com/datasets/mohdsufianbinothman/pytorch-models/data
    Explore at:
    zip(21493 bytes)Available download formats
    Dataset updated
    May 10, 2025
    Authors
    Sufian Othman
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    ✅ Step 1: Mount to Dataset

    Search for my dataset pytorch-models and add it — this will mount it at:

    /kaggle/input/pytorch-models/

    ✅ Step 2: Check file paths Once mounted, the four files will be available at:

    /kaggle/input/pytorch-models/base_models.py
    /kaggle/input/pytorch-models/ext_base_models.py
    /kaggle/input/pytorch-models/ext_hybrid_models.py
    /kaggle/input/pytorch-models/hybrid_models.py
    

    ✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):

    import shutil
    
    shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/')
    shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/')
    shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/')
    shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')
    

    ✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:

    import base_models
    import ext_base_models
    import ext_hybrid_models
    import hybrid_models
    

    Or, if you only want to import specific classes or functions:

    from base_models import YourModelClass
    from ext_base_models import AnotherModelClass
    

    ✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:

    model = base_models.YourModelClass()
    output = model(input_data)
    
  3. Oxford 102 Flower Dataset

    • kaggle.com
    zip
    Updated May 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lalu Erfandi Maula Yusnu (2021). Oxford 102 Flower Dataset [Dataset]. https://www.kaggle.com/nunenuh/pytorch-challange-flower-dataset
    Explore at:
    zip(346507679 bytes)Available download formats
    Dataset updated
    May 26, 2021
    Authors
    Lalu Erfandi Maula Yusnu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    We have created a 102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The details of the categories and the number of images for each class can be found on this category statistics page.

    The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.

    Directory Structure

    > dataset
      > train
      > valid
      > test
    - cat_to_name.json
    - README.md
    - sample_submission.csv
    

    Visualization of the dataset

    We visualize the categories in the dataset using SIFT features as shape descriptors and HSV as colour descriptor. The images are randomly sampled from the category.

    https://i.imgur.com/Tl6TKUC.png" alt="">

    Publications

    Nilsback, M-E. and Zisserman, A. Automated flower classification over a large number of classes
    Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)

    Source

  4. h

    bigearthnet

    • huggingface.co
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Colomba (2024). bigearthnet [Dataset]. https://huggingface.co/datasets/lc-col/bigearthnet
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2024
    Authors
    Luca Colomba
    Description

    BigEarthNet - HDF5 version

    This repository contains an export of the existing BigEarthNet dataset in HDF5 format. All Sentinel-2 acquisitions are exported according to TorchGeo's dataset (120x120 pixels resolution). Sentinel-1 is not contained in this repository for the moment. CSV files contain for each satellite acquisition the corresponding HDF5 file and the index. A PyTorch dataset class which can be used to iterate over this dataset can be found here, as well as the script used… See the full description on the dataset page: https://huggingface.co/datasets/lc-col/bigearthnet.

  5. cifar-100-python

    • kaggle.com
    zip
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ThanhTan (2024). cifar-100-python [Dataset]. https://www.kaggle.com/datasets/duongthanhtan/cifar-100-python
    Explore at:
    zip(168517675 bytes)Available download formats
    Dataset updated
    Dec 26, 2024
    Authors
    ThanhTan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CIFAR-100 Dataset

    1. Overview

    • CIFAR-100 is an extension of the CIFAR-10 dataset, with more classes and finer-grained categorization.
    • It contains 100 classes, making it more challenging than CIFAR-10, which has only 10 classes.
    • Each image in CIFAR-100 is labeled with both a fine label (specific category) and a coarse label (broader category, such as animals or vehicles).

    2. Dataset Details

    • Number of Images: 60,000 color images in total.
      • 50,000 for training.
      • 10,000 for testing.
    • Image Size: Each image is a small 32x32 pixel RGB (color) image.
    • Classes: 100 classes, grouped into 20 superclasses.
      • Each superclass contains 5 related classes.

    3. Fine and Coarse Labels

    • Fine Labels: The dataset has specific categories, such as 'apple', 'bicycle', 'rose', etc.
    • Coarse Labels: These are broader categories, like 'fruit', 'flower', 'vehicle', etc.

    4. Applications

    • Image Classification: Used for training models to classify images into their respective categories.
    • Feature Extraction: Useful for benchmarking feature extraction techniques in computer vision.
    • Transfer Learning: Often used to pre-train models for other similar tasks.
    • Deep Learning Research: Commonly used to test architectures like CNNs (Convolutional Neural Networks).

    5. Challenges

    • The images are very small (32x32 pixels), making it harder for models to learn intricate details.
    • High class count (100) increases classification complexity.
    • Intra-class variability and inter-class similarity make it a challenging dataset for classification.

    6. File Format

    • The dataset is usually available in Python-friendly formats like .pkl or .npz.
    • It can also be downloaded and loaded using frameworks like TensorFlow or PyTorch.

    7. Example Classes

    Some example classes include: - Animals: beaver, dolphin, otter, elephant, snake. - Plants: apple, orange, mushroom, palm tree, pine tree. - Vehicles: bicycle, bus, motorcycle, train, rocket. - Everyday Objects: clock, keyboard, lamp, table, chair.

  6. Z

    BIRD: Big Impulse Response Dataset

    • data.niaid.nih.gov
    • kaggle.com
    Updated Oct 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François (2020). BIRD: Big Impulse Response Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4139415
    Explore at:
    Dataset updated
    Oct 29, 2020
    Dataset provided by
    Mila - Université de Montréal
    Université de Sherbrooke
    Authors
    Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.

  7. Z

    Dataset for class comment analysis

    • data.niaid.nih.gov
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
    Explore at:
    Dataset updated
    Feb 22, 2022
    Dataset provided by
    University of Bern
    Authors
    Pooja Rani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

    Structure

    Projects/
      Java_projects/
        eclipse.zip
        guava.zip
        guice.zip
        hadoop.zip
        spark.zip
        vaadin.zip
    
      Pharo_projects/
        images/
          GToolkit.zip
          Moose.zip
          PetitParser.zip
          Pillar.zip
          PolyMath.zip
          Roassal2.zip
          Seaside.zip
    
        vm/
          70-x64/Pharo
    
        Scripts/
          ClassCommentExtraction.st
          SampleSelectionScript.st    
    
      Python_projects/
        django.zip
        ipython.zip
        Mailpile.zip
        pandas.zip
        pipenv.zip
        pytorch.zip   
        requests.zip 
      
    

    Contents of the Replication Package

    Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

    • Pharo_projects/

      • images/ -

        • GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
        • Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.
      • vm/ -

      • 70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

      • Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

      • ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

      • SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

    • Python_projects/

      • django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django
      • ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython
      • Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile
      • pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas
      • pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv
      • pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch
      • requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
  8. feral-cat-segmentation_dataset

    • kaggle.com
    • universe.roboflow.com
    zip
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lu hou yang (2025). feral-cat-segmentation_dataset [Dataset]. https://www.kaggle.com/datasets/luhouyang/feral-cat-segmentation-dataset
    Explore at:
    zip(971125684 bytes)Available download formats
    Dataset updated
    Mar 18, 2025
    Authors
    lu hou yang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Feral Cat Segmentation Dataset

    Overview

    This dataset provides image segmentation data for feral cats, designed for computer vision and machine learning tasks. It builds upon the original public domain dataset by Paul Cashman from Roboflow, with additional preprocessing and multiple data formats for easier consumption.

    Dataset Source

    Dataset Contents

    The dataset is organized into three standard splits: - Train set - Validation set - Test set

    Each split contains data in multiple formats: 1. Original JPG images 2. Segmentation mask JPG images 3. Parquet files containing flattened image and mask data 4. Pickle files containing serialized image and mask data

    Data Formats

    1. Image Files

    • Format: JPG
    • Resolution: 224×224 pixels
    • Directory Structure:
      • train/: Original training images
      • valid/: Original validation images
      • test/: Original test images
      • train_mask/: Corresponding segmentation masks for training
      • valid_mask/: Corresponding segmentation masks for validation
      • test_mask/: Corresponding segmentation masks for testing

    2. Parquet Files

    • Files: train_dataset.parquet, valid_dataset.parquet, test_dataset.parquet
    • Content: Flattened image data and corresponding masks combined in a single table
    • Structure: Each row contains the flattened pixel values of an image followed by the flattened pixel values of its mask
    • Data Division: Image and mask data are split at index split_at = image_size[0] * image_size[1] * image_channels
      • Data before this index: image pixel values (reshaped to [-1, 224, 224, 3])
      • Data after this index: mask pixel values (reshaped to [-1, 224, 224, 1])
    • Benefits: Efficient storage and faster loading compared to individual image files

    3. Pickle Files

    • Files: train_dataset.pkl, valid_dataset.pkl, test_dataset.pkl
    • Content: Serialized Python objects containing images and their corresponding masks
    • Structure: List of [image, mask] pairs, where each image and mask is serialized using Python's pickle
    • Data Access: Similar to parquet files, when loaded through the provided dataset class, data is split at the same index: split_at = image_size[0] * image_size[1] * image_channels
    • Benefits: Preserves original data structure and enables quick loading in Python

    4. CSV Files

    • Files: train_dataset.csv, valid_dataset.csv, test_dataset.csv
    • Content: Same data as parquet files but in CSV format
    • Structure: No headers, raw flattened pixel values
    • Data Division: Same split point as parquet files

    Image Preprocessing

    All images were preprocessed with the following operations: - Resized to 224×224 pixels using bilinear interpolation - Segmentation masks were also resized to match the images using nearest neighbor interpolation - Original RLE (Run-Length Encoding) segmentation data converted to binary masks

    Data Normalization

    When used with the provided PyTorch dataset class, images are normalized with: - Mean: [0.48235, 0.45882, 0.40784] - Standard Deviation: [0.00392156862745098, 0.00392156862745098, 0.00392156862745098]

    PyTorch Integration

    A custom CatDataset class is included for easy integration with PyTorch:

    from cat_dataset import CatDataset
    
    # Load from parquet format
    dataset = CatDataset(
      root="path/to/dataset",
      split="train", # Options: "train", "valid", "test"
      format="parquet", # Options: "parquet", "pkl"
      image_size=[224, 224],
      image_channels=3,
      mask_channels=1
    )
    
    # Use with PyTorch DataLoader
    from torch.utils.data import DataLoader
    dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    

    Performance Comparison

    Loading time benchmarks from the original implementation: - Parquet format: ~1.29 seconds per iteration - Pickle format: ~0.71 seconds per iteration

    The pickle format provides the fastest loading times and is recommended for most use cases.

    Citation

    If you use this dataset in your research or projects, please cite:

    @misc{feral-cat-segmentation_dataset,
     title = {feral-cat-segmentation Dataset},
     type = {Open Source Dataset},
     author = {Paul Cashman},
     howpublished = {\url{https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}},
     url = {https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation},
     journal = {Roboflow Universe},
     publisher = {Roboflow},
     year = {2025},
     month = {mar},
     note = {visited on 2025-03-19},
    }
    

    Sample Usage Code

    Basic Dataset Loading

    from ca...
    
  9. f

    Data from: Deep learning neural network derivation and testing to...

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    png
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omid Mehrpour; Christopher Hoyte; Abdullah Al Masud; Ashis Biswas; Jonathan Schimmel; Samaneh Nakhaee; Mohammad Sadegh Nasr; Heather Delva-Clark; Foster Goss (2023). Deep learning neural network derivation and testing to distinguish acute poisonings [Dataset]. http://doi.org/10.6084/m9.figshare.23694504.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Omid Mehrpour; Christopher Hoyte; Abdullah Al Masud; Ashis Biswas; Jonathan Schimmel; Samaneh Nakhaee; Mohammad Sadegh Nasr; Heather Delva-Clark; Foster Goss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Acute poisoning is a significant global health burden, and the causative agent is often unclear. The primary aim of this pilot study was to develop a deep learning algorithm that predicts the most probable agent a poisoned patient was exposed to from a pre-specified list of drugs. Data were queried from the National Poison Data System (NPDS) from 2014 through 2018 for eight single-agent poisonings (acetaminophen, diphenhydramine, aspirin, calcium channel blockers, sulfonylureas, benzodiazepines, bupropion, and lithium). Two Deep Neural Networks (PyTorch and Keras) designed for multi-class classification tasks were applied. There were 201,031 single-agent poisonings included in the analysis. For distinguishing among selected poisonings, PyTorch model had specificity of 97%, accuracy of 83%, precision of 83%, recall of 83%, and a F1-score of 82%. Keras had specificity of 98%, accuracy of 83%, precision of 84%, recall of 83%, and a F1-score of 83%. The best performance was achieved in the diagnosis of single-agent poisoning in diagnosing poisoning by lithium, sulfonylureas, diphenhydramine, calcium channel blockers, then acetaminophen, in PyTorch (F1-score = 99%, 94%, 85%, 83%, and 82%, respectively) and Keras (F1-score = 99%, 94%, 86%, 82%, and 82%, respectively). Deep neural networks can potentially help in distinguishing the causative agent of acute poisoning. This study used a small list of drugs, with polysubstance ingestions excluded.Reproducible source code and results can be obtained at https://github.com/ashiskb/npds-workspace.git.

  10. Fundus Glaucoma Detection Data [PyTorch format]

    • kaggle.com
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sabari (2023). Fundus Glaucoma Detection Data [PyTorch format] [Dataset]. https://www.kaggle.com/datasets/sabari50312/fundus-pytorch
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sabari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Subset of the SMDG-19 for Glaucoma dataset in PyTorch Format

    SMDG-19: https://www.kaggle.com/datasets/deathtrooper/multichannel-glaucoma-benchmark-dataset

    Contains Train, Val and Test set of Fundus images for Glaucoma Detection

    2 Classes (0|1)

    1: Glaucoma Present 0: Glaucoma not Present

  11. h

    CIFAR10

    • huggingface.co
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P2PFL (2024). CIFAR10 [Dataset]. https://huggingface.co/datasets/p2pfl/CIFAR10
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2024
    Authors
    P2PFL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🖼️ CIFAR10 (Extracted from PyTorch Vision)

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

      ℹ️ Dataset Details
    
    
    
    
    
      📖 Dataset Description
    

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The classes are completely mutually exclusive. There is no… See the full description on the dataset page: https://huggingface.co/datasets/p2pfl/CIFAR10.

  12. SimCATS_GaAs_v1_random_variations_v2

    • resodate.org
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Hader; Fabian Fuchs; Sarah Fleitmann (2024). SimCATS_GaAs_v1_random_variations_v2 [Dataset]. http://doi.org/10.26165/JUELICH-DATA/5PB3GT
    Explore at:
    Dataset updated
    Oct 9, 2024
    Dataset provided by
    Forschungszentrum Jülichhttp://www.fz-juelich.de/
    Peter Grünberg Institute - Integrated Computing Architectures (ICA/PGI-4)
    Authors
    Fabian Hader; Fabian Fuchs; Sarah Fleitmann
    Description

    Dataset: SimCATS_GaAs_v1_random_variations_v2 Simulated data from the geometric SimCATS model (GitHub Repository, Paper) for benchmarking of semiconductor quantum dot tuning algorithms. Generated using this Jupyter Notebook and used for the final evaluation in Automated Charge Transition Detection in Quantum Dot Charge Stability Diagrams. Key Facts Contains pink, white & random telegraph noise, transition blurring, and dot jumps Random variations of charge transitions, sensor, and distortions 1.000 randomly sampled configurations with 100 CSDs each (in total: 100.000 CSDs) Usage To load the data, e.g. for calculating metrics, please have a look at SimCATS-Datasets (GitHub Repository, ReadTheDocs). The dataset can be loaded as numpy arrays using the function load_dataset or as PyTorch Dataset class (for machine learning purposes) using the class SimcatsDataset.

  13. h

    CrashCar

    • huggingface.co
    Updated Jul 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jens Parslov (2024). CrashCar [Dataset]. https://huggingface.co/datasets/JensParslov/CrashCar
    Explore at:
    Dataset updated
    Jul 10, 2024
    Authors
    Jens Parslov
    Description

    Dataset Card for Dataset CrashCar

    This is the dataset proposed in 'CrashCar101: Procedural Generation for Damage Assessment' [WACV24]

    Project Page: https://crashcar.compute.dtu.dk Repository: https://github.com/JensPars/CrashCar_procedural_generation Paper: https://openaccess.thecvf.com/content/WACV2024/papers/Parslov_CrashCar101_Procedural_Generation_for_Damage_Assessment_WACV_2024_paper.pdf

    Example dataset class in pytorch import os import torch from glob import glob from… See the full description on the dataset page: https://huggingface.co/datasets/JensParslov/CrashCar.

  14. Sentence/Table Pair Data from Wikipedia for Pre-training with...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Oct 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. http://doi.org/10.5281/zenodo.5612316
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

    There are two files:

    sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

    table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

    The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

    For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

    Below is a sample code snippet to load the data

    import webdataset as wds
    
    # path to the uncompressed files, should be a directory with a set of tar files
    url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
    dataset = (
      wds.Dataset(url)
      .shuffle(1000) # cache 1000 samples and shuffle
      .decode()
      .to_tuple("json")
      .batched(20) # group every 20 examples into a batch
    )
    
    # Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
    # You can also iterate through all examples and dump them with your preferred data format

    Below we show how the data is organized with two examples.

    Text-only

    {'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
     's1_all_links': {
      'Sils,_Girona': [[0, 4]],
      'municipality': [[10, 22]],
      'Comarques_of_Catalonia': [[30, 37]],
      'Selva': [[41, 46]],
      'Catalonia': [[51, 60]]
     }, # list of entities and their mentions in the sentence (start, end location)
     'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
      {
        'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
        's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
        's2s': [ # list of other sentences that contain the common entity pair, or evidence
         {
           'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
           'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
           's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
           'pair_locs': [ # mentions of the entity pair in the evidence
            [[19, 27]], # mentions of entity 1
            [[0, 5], [288, 293]] # mentions of entity 2
           ],
           'all_links': {
            'Selva': [[0, 5], [288, 293]],
            'Comarques_of_Catalonia': [[19, 27]],
            'Catalonia': [[40, 49]]
           }
          }
        ,...] # there are multiple evidence sentences
       },
     ,...] # there are multiple entity pairs in the query
    }

    Hybrid

    {'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
     's1_all_links': {...}, # same as text-only
     'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
     'table_pairs': [
      'tid': 'Major_League_Baseball-1',
      'text':[
        ['World Series Records', 'World Series Records', ...],
        ['Team', 'Number of Series won', ...],
        ['St. Louis Cardinals (NL)', '11', ...],
      ...] # table content, list of rows
      'index':[
        [[0, 0], [0, 1], ...],
        [[1, 0], [1, 1], ...],
      ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
      'value_ranks':[
        [0, 0, ...],
        [0, 0, ...],
        [0, 10, ...],
      ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
      'value_inv_ranks': [], # inverse rank
      'all_links':{
        'St._Louis_Cardinals': {
         '2': [
          [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
         ] # list of mentions in the second row, the key is row_id
        },
        'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
      }
      'name': '', # table name, if exists
      'pairs': {
        'pair': ['American_League', 'National_League'],
        's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
        'table_pair_locs': {
         '17': [ # mention of entity pair in row 17
           [
            [[17, 0], [3, 18]],
            [[17, 1], [3, 18]],
            [[17, 2], [3, 18]],
            [[17, 3], [3, 18]]
           ], # mention of the first entity
           [
            [[17, 0], [21, 36]],
            [[17, 1], [21, 36]],
           ] # mention of the second entity
         ]
        }
       }
     ]
    }

  15. FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1)

    • zenodo.org
    bin, png, zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Balada; Christoph Balada; Max Bondorf; Sheraz Ahmed; Andreas Dengel; Andreas Dengel; Markus Zdrallek; Max Bondorf; Sheraz Ahmed; Markus Zdrallek (2024). FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1) [Dataset]. http://doi.org/10.5281/zenodo.8328113
    Explore at:
    bin, zip, pngAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christoph Balada; Christoph Balada; Max Bondorf; Sheraz Ahmed; Andreas Dengel; Andreas Dengel; Markus Zdrallek; Max Bondorf; Sheraz Ahmed; Markus Zdrallek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # FiN-2 Large-Scale Real-World PLC-Dataset

    ## About
    #### FiN-2 dataset in a nutshell:
    FiN-2 is the first large-scale real-world dataset on data collected in a powerline communication infrastructure. Since the electricity grid is inherently a graph, our dataset could be interpreted as a graph dataset. Therefore, we use the word node to describe points (cable distribution cabinets) of measurement within the low-voltage electricity grid and the word edge to describe connections (cables) in between them. However, since these are PLC connections, an edge does not necessarily have to correspond to a real cable; more on this in our paper.
    FiN-2 shows measurements that relate to the nodes (voltage, total harmonic distortion) as well as to the edges (signal-to-noise ratio spectrum, tonemap). In total, FiN-2 is distributed across three different sites with a total of 1,930,762,116 node measurements each for the individual features and 638,394,025 edge measurements each for all 917 PLC channels. All data was collected over a 25-month period from mid-2020 to the end of 2022.
    We propose this dataset to foster research in the domain of grid automation and smart grid. Therefore, we provide different example use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection. For more decent information on this dataset, please see our [paper](https://arxiv.org/abs/2209.12693).

    * * *
    ## Content
    FiN-2 dataset splits up into two compressed `csv-Files`: *nodes.csv* and *edges.csv*.

    All files are provided as a compressed ZIP file and are divided into four parts. The first part can be found in this repo, while the remaining parts can be found in the following:
    - https://zenodo.org/record/8328105
    - https://zenodo.org/record/8328108
    - https://zenodo.org/record/8328111

    ### Node data

    | id | ts | v1 | v2 | v3 | thd1 | thd2 | thd3 | phase_angle1 | phase_angle2 | phase_angle3 | temp |
    |----|----|----|----|----|----|----|----|----|----|----|----|----|----|
    |112|1605530460|236.5|236.4|236.0|2.9|2.5|2.4|120.0|119.8|120.0|35.3|
    |112|1605530520|236.9|236.6|236.6|3.1|2.7|2.5|120.1|119.8|120.0|35.3|
    |112|1605530580|236.2|236.4|236.0|3.1|2.7|2.5|120.0|120.0|119.9|35.5|

    - id / ts: Unique identifier of the node that is measured and timestemp of the measurement
    - v1/v2/v3: Voltage measurements of all three phases
    - thd1/thd2/thd3: Total harmonic distortion of all three phases
    - phase_angle1/2/3: Phase angle of all three phases
    - temp: Temperature in-circuit of the sensor inside a cable distribution unit (in °C)

    ### Edge data
    | src | dst | ts | snr0 | snr1 | snr2 | ... | snr916 |
    |----|----|----|----|----|----|----|----|
    |62|94|1605528900|70|72|45|...|-53|
    |62|32|1605529800|16|24|13|...|-51|
    |17|94|1605530700|37|25|24|...|-55|

    - src & dst & ts: Unique identifier of the source and target nodes where the spectrum is measured and time of measurement
    - snr0/snr1/.../snr916: 917 SNR measurements in tenths of a decibel (e.g. 50 --> 5dB).

    ### Metadata
    Metadata that is provided along with the data covers:

    - Number of cable joints
    - Cable properties (length, type, number of sections)
    - Relative position of the nodes (location, zero-centered gps)
    - Adjacent PV or wallbox installations
    - Year of installation w.r.t. the nodes and cables

    Since the electricity grid is part of the critical infrastructure, it is not possible to provide exact GPS locations.

    * * *
    ## Usage
    Simple data access using pandas:

    ```
    import pandas as pd

    nodes_file = "nodes.csv.gz" # /path/to/nodes.csv.gz
    edges_file = "edges.csv.gz" # /path/to/edges.csv.gz

    # read the first 10 rows
    data = pd.read_csv(nodes_file, nrows=10, compression='gzip')

    # read the row number 5 to 15
    data = pd.read_csv(nodes_file, nrows=10, skiprows=[i for i in range(1,6)], compression='gzip')

    # ... same for the edges
    ```

    Compressed csv-data format was used to make sharing as easy as possible, however it comes with significant drawbacks for machine learning. Due to the inherent graph structure, a single snapshot of the whole graph consists of a set of node and edge measurements. But due to timeouts, noise and other disturbances, nodes sometimes fail in collecting the data, wherefore the number of measurements for a specific timestamp differs. This, plus the high sparsity of the graph, leads to a high inefficiency when using the csv-format for an ML training.
    To utilize the data in an ML pipeline, we recommend other data formats like [datadings](https://datadings.readthedocs.io/en/latest/) or specialized database solutions like [VictoriaMetrics](https://victoriametrics.com/).


    ### Example use case (voltage forecasting)

    Forecasting of the voltage is one potential use cases. The Jupyter notebook provided in the repository gives an overview of how the dataset can be loaded, preprocessed and used for ML training. Thereby, a MinMax scaling was used as simple preprocessing and a PyTorch dataset class was created to handle the data. Furthermore, a vanilla autoencoder is utilized to process and forecast the voltage into the future.

  16. Z

    Data from: Self-Supervised Representation Learning on Neural Network Weights...

    • data.niaid.nih.gov
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian (2021). Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction - Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5645137
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset provided by
    University of St.Gallen
    Authors
    Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".

    Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.

    Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)

    Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.

  17. GISE-51

    • zenodo.org
    application/gzip, txt
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarthak Yadav; Sarthak Yadav; Mary Ellen Foster; Mary Ellen Foster (2021). GISE-51 [Dataset]. http://doi.org/10.5281/zenodo.4593514
    Explore at:
    application/gzip, txtAvailable download formats
    Dataset updated
    Apr 13, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sarthak Yadav; Sarthak Yadav; Mary Ellen Foster; Mary Ellen Foster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at https://github.com/SarthakYadav/GISE-51-pytorch.

    Citation

    If you use the GISE-51 dataset and/or the released code, please cite our paper:

    Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021

    Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper:

    Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.

    About GISE-51 and GISE-51-Mixtures

    The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper.

    GISE-51

    • Three subsets: train, val and eval with 12465, 1716, and2176 utterances. Subsets are in coherence with the FSD50K release.
    • Encompasses 51 sound classes from the FSD50K release
    • View meta/lbl_map.csv for the complete vocabulary.
    • The dataset was obtained from FSD50K using the following steps:
      • Unsmearing annotations to obtain single instances with a single label using the provided metadata and ground truth in FSD50K.
      • Manual inspection to qualitatively evaluate shortlisted utterances.
      • Volume-threshold based automated silence filtering using sox. Different volume thresholds are selected for various sound event class bins using trial-and-error. silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/strip_silence_sox.py in the code repository.
      • Re-evaluate sound event classes, removing ones with too few samples and merging those with high inter-class ambiguity.

    GISE-51-Mixtures

    • Synthetic 5-second soundscapes with up to 3 events created using Scaper.
    • Weighted sampling with replacement for sound event selection, effectively oversampling events with very few samples. Synthetic soundscapes generated thus have a near equal number of annotations per sound event.
    • The number of soundscapes in val and eval set is 10000 each.
    • The number of soundscapes in the final train set is 60000. We do provide training sets with 5k-100k soundscapes.
    • GISE-51-Mixtures is our proposed subset that can be used to benchmark the performance of future works.

    LICENSE

    All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.

    GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz.

    Baselines

    Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at https://github.com/SarthakYadav/GISE-51-pytorch.

    Files

    GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz. Following lists the contents of these files in detail:

    • isolated_events.tar.gz: The core GISE-51 isolated events dataset containing train, val and eval subfolders.
    • meta.tar.gz: contains lbl_map.json
    • noises.tar.gz: contains background noises used for GISE-51-Mixtures soundscape generation
    • mixtures_jams.tar.gz: This file contains annotation files in .jams format that, alongside isolated_events.tar.gz and noises.tar.gz can be reused to generate exact GISE-51-Mixtures soundscapes. (Optional, we provide the complete set of GISE-51-Mixtures soundscapes as independent tar archives.)
    • train.tar.gz: GISE-51-Mixtures train set, containing 60k synthetic soundscapes.
    • val.tar.gz: GISE-51-Mixtures val set, containing 10k synthetic soundscapes.
    • eval.tar.gz: GISE-51-Mixtures eval set, containing 10k synthetic soundscapes.
    • train_*.tar.gz: These are tar archives containing training mixtures of a various number of soundscapes, used primarily in Section 4.1 of the paper, which compares val mAP performance v/s number of training soundscapes. A helper script is provided in the code release, prepare_mixtures_lmdb.sh, to prepare data for experiments in Section 4.1.
    • pretrained-models.tar.gz: Contains model checkpoints for all experiments conducted in the paper. More information on these checkpoints can be found in the code release README.
      • experiments_60k_mixtures: model checkpoints from section 4.2 of the paper.
      • exported_weights_60k: ResNet-18 and EfficientNet-B1 exported as plain state_dicts for use with transfer learning experiments.
      • experiments_audioset: checkpoints from AudioSet Balanced (Sec 4.3.1) experiments
      • experiments_vggsound: checkpoints from Section 4.3.2 of the paper
      • experiments_esc50: ESC-50 dataset checkpoints, from Section 4.3.3
    • license.tar.gz: contains dataset license info.
    • silence_thresholds.txt: contains volume thresholds for various sound event bins used for silence filtering.

    Contact

    In case of queries and clarifications, feel free to contact Sarthak at s.yadav.2@research.gla.ac.uk. (Adding [GISE-51] to the subject of the email would be appreciated!)

  18. Replication package for the paper "Do Comments follow Commenting...

    • zenodo.org
    zip
    Updated Aug 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pooja; Pooja (2021). Replication package for the paper "Do Comments follow Commenting Conventions? A case study in Java and Python" [Dataset]. http://doi.org/10.5281/zenodo.5296443
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 28, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pooja; Pooja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    # RP-comment-convention-adherence-Java-Python
    Replication Package for the paper "Do Comments follow Commenting Conventions? A case study in Java and Python".
    It uses the dataset provided by Rani et.al.'s work [How to identify class comment types? A multi-language approach for class
     comment classification](https://github.com/poojaruhal/RP-class-comment-classification).
    
    
    ## Structure
    ```
    RQ1/
      RQ1_Java_Rules.xlsx
      RQ1_Python_Rules.xlsx
    
    RQ2/
      RQ1_Java_Comments_Validated.xlsx
      RQ1_Python_Comments_Validated.xlsx
      Raw-projects/
      Java_projects/
        eclipse.zip
        guava.zip
        guice.zip
        hadoop.zip
        spark.zip
        vaadin.zip
      Python_projects/
        django.zip
        ipython.zip
        Mailpile.zip
        pandas.zip
        pipenv.zip
        pytorch.zip   
        requests.zip
    Style-guides
    ```
    
    ## Contents of the Replication Package
    ---
    
    - **RQ1/** - contains the data used to answer RQ1
      - `RQ1_Java_Rules.xlsx` - contains comment-related rules extracted from various Java style guidelines. Various tabs in the sheet represent the rules extracted from standard or project-specific guidelines.
      Oracle and Google are the standard guidelines, and the remaining are specific to the projects.
      - `RQ1_Python_Rules.xlsx` - contains comment-related rules extracted from various Python style guidelines. Various tabs in the sheet represent the rules extracted from standard or project-specific guidelines. PEP, Numpy, and Google are the standard guidelines and the remaining are specific to the projects.
    
    - **RQ2/** - contains the data used to answer RQ2
      - `RQ2_Java_Comments_Validated.xlsx` - contains Java comment dataset used from the previous work and validated against the rules from their corresponding guidelines. Various tabs in the sheet represent various Java projects used in the work. The rows in each tab show the sample class comments used to validate against the rules. The rules are shown in the columns.
      - `RQ2_Python_Comments_Validated.xlsx` - contains Python comment dataset used from the previous work and validated against the rules from their corresponding guidelines. Various tabs in the sheet represent various Java projects used in the work. The rows in each tab show the sample class comments used to validate against the rules. The rules are shown in the columns.  
      - **Raw-projects/** contains the raw projects of each language that are used to analyze class comments.
        - **Java_projects/**
          - `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is on https://github.com/eclipse
          - `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is on https://github.com/google/guava
          - `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is on https://github.com/google/guice
          - `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is on https://github.com/apache/hadoop
          - `spark.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is on https://github.com/apache/spark
          - `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is on https://github.com/vaadin/framework
    
        - **Python_projects/**
          - `django.zip` - Django project downloaded from the GitHub. More detail about the project is on https://github.com/django. 
          - `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is on https://github.com/ipython/ipython
          - `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is on https://github.com/mailpile/Mailpile
          - `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is on https://github.com/pandas-dev/pandas
          - `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is on https://github.com/pypa/pipenv
          - `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is on https://github.com/pytorch/pytorch
          - `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is on https://github.com/psf/requests/
    - **Style-guides/**- contains the style guidelines used for the selected projects.
    ---

  19. h

    Changen2-S1-15k

    • huggingface.co
    Updated Oct 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhuo Zheng (2024). Changen2-S1-15k [Dataset]. https://huggingface.co/datasets/EVER-Z/Changen2-S1-15k
    Explore at:
    Dataset updated
    Oct 16, 2024
    Authors
    Zhuo Zheng
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Changen2-S1-15k

    Changen2-S1-15k (a building change dataset with 15k pairs and 2 change types), 0.3-1m spatial resolution, RGB bands

      Dataset Sources
    

    Repository: https://github.com/Z-Zheng/pytorch-change-models Paper: https://ieeexplore.ieee.org/document/10713915

      Citation
    

    BibTeX: @article{zheng_changen2, author={Zheng, Zhuo and Ermon, Stefano and Kim, Dongjun and Zhang, Liangpei and Zhong, Yanfei}, journal={IEEE Transactions on Pattern… See the full description on the dataset page: https://huggingface.co/datasets/EVER-Z/Changen2-S1-15k.

  20. Z

    3DO Dataset | On the Generalization of WiFi-based Person-centric Sensing in...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Strohmayer, Julian (2024). 3DO Dataset | On the Generalization of WiFi-based Person-centric Sensing in Through-Wall Scenarios [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10925350
    Explore at:
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Strohmayer, Julian
    Kampel, Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    On the Generalization of WiFi-based Person-centric Sensing in Through-Wall Scenarios

    This repository contains the 3DO dataset proposed in [1].

    PyTroch Dataloader

    A minimal PyTorch dataloader for the 3DO dataset is provided at: https://github.com/StrohmayerJ/3DO

    Dataset Description

    The 3DO dataset comprises 42 five-minute recordings (~1.25M WiFi packets) of three human activities performed by a single person, captured in a WiFi through-wall sensing scenario over three consecutive days. Each WiFi packet is annotated with a 3D trajectory label and a class label for the activities: no person/background (0), walking (1), sitting (2), and lying (3). (Note: The labels returned in our dataloader example are walking (0), sitting (1), and lying (2), because background sequences are not used.)

    The directories 3DO/d1/, 3DO/d2/, and 3DO/d3/ contain the sequences from days 1, 2, and 3, respectively. Furthermore, each sequence directory (e.g., 3DO/d1/w1/) contains a csiposreg.csv file storing the raw WiFi packet time series and a csiposreg_complex.npy cache file, which stores the complex Channel State Information (CSI) of the WiFi packet time series. (If missing, csiposreg_complex.npy is automatically generated by the provided dataloader.)

    Dataset Structure:

    /3DO

    ├── d1 <-- day 1 subdirectory

      └── w1 <-- sequence subdirectory
    
         └── csiposreg.csv <-- raw WiFi packet time series
    
         └── csiposreg_complex.npy <-- CSI time series cache
    

    ├── d2 <-- day 2 subdirectory

    ├── d3 <-- day 3 subdirectory

    In [1], we use the following training, validation, and test split:

    Subset Day Sequences

    Train 1 w1, w2, w3, s1, s2, s3, l1, l2, l3

    Val 1 w4, s4, l4

    Test 1 w5 , s5, l5

    Test 2 w1, w2, w3, w4, w5, s1, s2, s3, s4, s5, l1, l2, l3, l4, l5

    Test 3 w1, w2, w4, w5, s1, s2, s3, s4, s5, l1, l2, l4

    w = walking, s = sitting and l= lying

    Note: On each day, we additionally recorded three ten-minute background sequences (b1, b2, b3), which are provided as well.

    Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].

    [1] Strohmayer, J., Kampel, M. (2025). On the Generalization of WiFi-Based Person-Centric Sensing in Through-Wall Scenarios. In: Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham. https://doi.org/10.1007/978-3-031-78354-8_13

    BibTeX citation:

    @inproceedings{strohmayerOn2025, author="Strohmayer, Julian and Kampel, Martin", title="On the Generalization of WiFi-Based Person-Centric Sensing in Through-Wall Scenarios", booktitle="Pattern Recognition", year="2025", publisher="Springer Nature Switzerland", address="Cham", pages="194--211", isbn="978-3-031-78354-8" }

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST

Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Image Processing Group, Universitat Politècnica de Catalunya
AI Lab Montreal, Samsung Advanced Institute of Technology
AIML Lab, University of St.Gallen
Authors
Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Search
Clear search
Close search
Google apps
Main menu