87 datasets found

Z
Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...
data.niaid.nih.gov
Updated Jun 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086
Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Image Processing Group, Universitat Politècnica de Catalunya
AI Lab Montreal, Samsung Advanced Institute of Technology
AIML Lab, University of St.Gallen
Authors
Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
Pytorch Models
kaggle.com
zip
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufian Othman (2025). Pytorch Models [Dataset]. https://www.kaggle.com/datasets/mohdsufianbinothman/pytorch-models/data
Explore at:
zip(21493 bytes)Available download formats
Dataset updated
May 10, 2025
Authors
Sufian Othman
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
✅ Step 1: Mount to Dataset

Search for my dataset pytorch-models and add it — this will mount it at:

/kaggle/input/pytorch-models/

✅ Step 2: Check file paths Once mounted, the four files will be available at:

/kaggle/input/pytorch-models/base_models.py /kaggle/input/pytorch-models/ext_base_models.py /kaggle/input/pytorch-models/ext_hybrid_models.py /kaggle/input/pytorch-models/hybrid_models.py

✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):

import shutil shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')

✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:

import base_models import ext_base_models import ext_hybrid_models import hybrid_models

Or, if you only want to import specific classes or functions:

from base_models import YourModelClass from ext_base_models import AnotherModelClass

✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:

model = base_models.YourModelClass() output = model(input_data)
Oxford 102 Flower Dataset
kaggle.com
zip
Updated May 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lalu Erfandi Maula Yusnu (2021). Oxford 102 Flower Dataset [Dataset]. https://www.kaggle.com/nunenuh/pytorch-challange-flower-dataset
Explore at:
zip(346507679 bytes)Available download formats
Dataset updated
May 26, 2021
Authors
Lalu Erfandi Maula Yusnu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

We have created a 102 category dataset, consisting of 102 flower categories. The flowers chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images. The details of the categories and the number of images for each class can be found on this category statistics page.

The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.

Directory Structure

> dataset > train > valid > test - cat_to_name.json - README.md - sample_submission.csv

Visualization of the dataset

We visualize the categories in the dataset using SIFT features as shape descriptors and HSV as colour descriptor. The images are randomly sampled from the category.

https://i.imgur.com/Tl6TKUC.png" alt="">

Publications

Nilsback, M-E. and Zisserman, A. Automated flower classification over a large number of classes
Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)

Source

Original source of this data can be found in 102 Category Flower Dataset

Original readme from author can be found in AUTHOR README

Directory test is added from another kaggle dataset that can be found in Oxford 102 Flower Pytorch
h
bigearthnet
huggingface.co
Updated Jul 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luca Colomba (2024). bigearthnet [Dataset]. https://huggingface.co/datasets/lc-col/bigearthnet
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 13, 2024
Authors
Luca Colomba
Description
BigEarthNet - HDF5 version

This repository contains an export of the existing BigEarthNet dataset in HDF5 format. All Sentinel-2 acquisitions are exported according to TorchGeo's dataset (120x120 pixels resolution). Sentinel-1 is not contained in this repository for the moment. CSV files contain for each satellite acquisition the corresponding HDF5 file and the index. A PyTorch dataset class which can be used to iterate over this dataset can be found here, as well as the script used… See the full description on the dataset page: https://huggingface.co/datasets/lc-col/bigearthnet.
cifar-100-python
kaggle.com
zip
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ThanhTan (2024). cifar-100-python [Dataset]. https://www.kaggle.com/datasets/duongthanhtan/cifar-100-python
Explore at:
zip(168517675 bytes)Available download formats
Dataset updated
Dec 26, 2024
Authors
ThanhTan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CIFAR-100 Dataset

1. Overview

CIFAR-100 is an extension of the CIFAR-10 dataset, with more classes and finer-grained categorization.

It contains 100 classes, making it more challenging than CIFAR-10, which has only 10 classes.

Each image in CIFAR-100 is labeled with both a fine label (specific category) and a coarse label (broader category, such as animals or vehicles).

2. Dataset Details

Number of Images: 60,000 color images in total.

50,000 for training.

10,000 for testing.

Image Size: Each image is a small 32x32 pixel RGB (color) image.

Classes: 100 classes, grouped into 20 superclasses.

Each superclass contains 5 related classes.

3. Fine and Coarse Labels

Fine Labels: The dataset has specific categories, such as 'apple', 'bicycle', 'rose', etc.

Coarse Labels: These are broader categories, like 'fruit', 'flower', 'vehicle', etc.

4. Applications

Image Classification: Used for training models to classify images into their respective categories.

Feature Extraction: Useful for benchmarking feature extraction techniques in computer vision.

Transfer Learning: Often used to pre-train models for other similar tasks.

Deep Learning Research: Commonly used to test architectures like CNNs (Convolutional Neural Networks).

5. Challenges

The images are very small (32x32 pixels), making it harder for models to learn intricate details.

High class count (100) increases classification complexity.

Intra-class variability and inter-class similarity make it a challenging dataset for classification.

6. File Format

The dataset is usually available in Python-friendly formats like .pkl or .npz.

It can also be downloaded and loaded using frameworks like TensorFlow or PyTorch.

7. Example Classes

Some example classes include: - Animals: beaver, dolphin, otter, elephant, snake. - Plants: apple, orange, mushroom, palm tree, pine tree. - Vehicles: bicycle, bus, motorcycle, train, rocket. - Everyday Objects: clock, keyboard, lamp, table, chair.
Z
BIRD: Big Impulse Response Dataset
data.niaid.nih.gov
kaggle.com
Updated Oct 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François (2020). BIRD: Big Impulse Response Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4139415
Explore at:
Dataset updated
Oct 29, 2020
Dataset provided by
Mila - Université de Montréal
Université de Sherbrooke
Authors
Grondin, François; Lauzon, Jean-Samuel; Michaud, Simon; Ravanelli, Mirco; Michaud, François
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BIRD is an open dataset that consists of 100,000 multichannel room impulse responses generated using the image method. This makes it the largest multichannel open dataset currently available. We provide some Python code that shows how to download and use this dataset to perform online data augmentation. The code is compatible with the PyTorch dataset class, which eases integration in existing deep learning projects based on this framework.
Z
Dataset for class comment analysis
data.niaid.nih.gov
Updated Feb 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pooja Rani (2022). Dataset for class comment analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4311838
Explore at:
Dataset updated
Feb 22, 2022
Dataset provided by
University of Bern
Authors
Pooja Rani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A list of different projects selected to analyze class comments (available in the source code) of various languages such as Java, Python, and Pharo. The projects vary in terms of size, contributors, and domain.

Structure

Projects/ Java_projects/ eclipse.zip guava.zip guice.zip hadoop.zip spark.zip vaadin.zip Pharo_projects/ images/ GToolkit.zip Moose.zip PetitParser.zip Pillar.zip PolyMath.zip Roassal2.zip Seaside.zip vm/ 70-x64/Pharo Scripts/ ClassCommentExtraction.st SampleSelectionScript.st Python_projects/ django.zip ipython.zip Mailpile.zip pandas.zip pipenv.zip pytorch.zip requests.zip

Contents of the Replication Package

Projects/ contains the raw projects of each language that are used to analyze class comments. - Java_projects/ - eclipse.zip - Eclipse project downloaded from the GitHub. More detail about the project is available on GitHub Eclipse. - guava.zip - Guava project downloaded from the GitHub. More detail about the project is available on GitHub Guava. - guice.zip - Guice project downloaded from the GitHub. More detail about the project is available on GitHub Guice - hadoop.zip - Apache Hadoop project downloaded from the GitHub. More detail about the project is available on GitHub Apache Hadoop - spark.zip - Apache Spark project downloaded from the GitHub. More detail about the project is available on GitHub Apache Spark - vaadin.zip - Vaadin project downloaded from the GitHub. More detail about the project is available on GitHub Vaadin

Pharo_projects/

images/ -

GToolkit.zip - Gtoolkit project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Moose.zip - Moose project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

PetitParser.zip - Petit Parser project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Pillar.zip - Pillar project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

PolyMath.zip - PolyMath project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Roassal2.zip - Roassal2 project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

Seaside.zip - Seaside project is imported into the Pharo image. We can run this image with the virtual machine given in the vm/ folder. The script to extract the comments is already provided in the image.

vm/ -

70-x64/Pharo - Pharo7 (version 7 of Pharo) virtual machine to instantiate the Pharo images given in the images/ folder. The user can run the vm on macOS and select any of the Pharo image.

Scripts/ - It contains the sample Smalltalk scripts to extract class comments from various projects.

ClassCommentExtraction.st - A Smalltalk script to show how class comments are extracted from various Pharo projects. This script is already provided in the respective project image.

SampleSelectionScript.st - A Smalltalk script to show sample class comments of Pharo projects are selected. This script can be run in any of the Pharo images given in the images/ folder.

Python_projects/

django.zip - Django project downloaded from the GitHub. More detail about the project is available on GitHub Django

ipython.zip - IPython project downloaded from the GitHub. More detail about the project is available on GitHub on IPython

Mailpile.zip - Mailpile project downloaded from the GitHub. More detail about the project is available on GitHub on Mailpile

pandas.zip - pandas project downloaded from the GitHub. More detail about the project is available on GitHub on pandas

pipenv.zip - Pipenv project downloaded from the GitHub. More detail about the project is available on GitHub on Pipenv

pytorch.zip - PyTorch project downloaded from the GitHub. More detail about the project is available on GitHub on PyTorch

requests.zip - Requests project downloaded from the GitHub. More detail about the project is available on GitHub on Requests
feral-cat-segmentation_dataset
kaggle.com
universe.roboflow.com
zip
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
lu hou yang (2025). feral-cat-segmentation_dataset [Dataset]. https://www.kaggle.com/datasets/luhouyang/feral-cat-segmentation-dataset
Explore at:
zip(971125684 bytes)Available download formats
Dataset updated
Mar 18, 2025
Authors
lu hou yang
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Feral Cat Segmentation Dataset

Overview

This dataset provides image segmentation data for feral cats, designed for computer vision and machine learning tasks. It builds upon the original public domain dataset by Paul Cashman from Roboflow, with additional preprocessing and multiple data formats for easier consumption.

Dataset Source

Original Author: Paul Cashman

Original Source: Roboflow Universe

Extended by: Lu Hou Yang

GitHub: https://github.com/luhouyang/open_circles

License: Public Domain

Dataset Contents

The dataset is organized into three standard splits: - Train set - Validation set - Test set

Each split contains data in multiple formats: 1. Original JPG images 2. Segmentation mask JPG images 3. Parquet files containing flattened image and mask data 4. Pickle files containing serialized image and mask data

Data Formats

1. Image Files

Format: JPG

Resolution: 224×224 pixels

Directory Structure:

train/: Original training images

valid/: Original validation images

test/: Original test images

train_mask/: Corresponding segmentation masks for training

valid_mask/: Corresponding segmentation masks for validation

test_mask/: Corresponding segmentation masks for testing

2. Parquet Files

Files: train_dataset.parquet, valid_dataset.parquet, test_dataset.parquet

Content: Flattened image data and corresponding masks combined in a single table

Structure: Each row contains the flattened pixel values of an image followed by the flattened pixel values of its mask

Data Division: Image and mask data are split at index split_at = image_size[0] * image_size[1] * image_channels

Data before this index: image pixel values (reshaped to [-1, 224, 224, 3])

Data after this index: mask pixel values (reshaped to [-1, 224, 224, 1])

Benefits: Efficient storage and faster loading compared to individual image files

3. Pickle Files

Files: train_dataset.pkl, valid_dataset.pkl, test_dataset.pkl

Content: Serialized Python objects containing images and their corresponding masks

Structure: List of [image, mask] pairs, where each image and mask is serialized using Python's pickle

Data Access: Similar to parquet files, when loaded through the provided dataset class, data is split at the same index: split_at = image_size[0] * image_size[1] * image_channels

Benefits: Preserves original data structure and enables quick loading in Python

4. CSV Files

Files: train_dataset.csv, valid_dataset.csv, test_dataset.csv

Content: Same data as parquet files but in CSV format

Structure: No headers, raw flattened pixel values

Data Division: Same split point as parquet files

Image Preprocessing

All images were preprocessed with the following operations: - Resized to 224×224 pixels using bilinear interpolation - Segmentation masks were also resized to match the images using nearest neighbor interpolation - Original RLE (Run-Length Encoding) segmentation data converted to binary masks

Data Normalization

When used with the provided PyTorch dataset class, images are normalized with: - Mean: [0.48235, 0.45882, 0.40784] - Standard Deviation: [0.00392156862745098, 0.00392156862745098, 0.00392156862745098]

PyTorch Integration

A custom CatDataset class is included for easy integration with PyTorch:

from cat_dataset import CatDataset # Load from parquet format dataset = CatDataset( root="path/to/dataset", split="train", # Options: "train", "valid", "test" format="parquet", # Options: "parquet", "pkl" image_size=[224, 224], image_channels=3, mask_channels=1 ) # Use with PyTorch DataLoader from torch.utils.data import DataLoader dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

Performance Comparison

Loading time benchmarks from the original implementation: - Parquet format: ~1.29 seconds per iteration - Pickle format: ~0.71 seconds per iteration

The pickle format provides the fastest loading times and is recommended for most use cases.

Citation

If you use this dataset in your research or projects, please cite:

@misc{feral-cat-segmentation_dataset, title = {feral-cat-segmentation Dataset}, type = {Open Source Dataset}, author = {Paul Cashman}, howpublished = {\url{https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}}, url = {https://universe.roboflow.com/paul-cashman-mxgwb/feral-cat-segmentation}, journal = {Roboflow Universe}, publisher = {Roboflow}, year = {2025}, month = {mar}, note = {visited on 2025-03-19}, }

Sample Usage Code

Basic Dataset Loading

from ca...
f
Data from: Deep learning neural network derivation and testing to...
tandf.figshare.com
datasetcatalog.nlm.nih.gov
png
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omid Mehrpour; Christopher Hoyte; Abdullah Al Masud; Ashis Biswas; Jonathan Schimmel; Samaneh Nakhaee; Mohammad Sadegh Nasr; Heather Delva-Clark; Foster Goss (2023). Deep learning neural network derivation and testing to distinguish acute poisonings [Dataset]. http://doi.org/10.6084/m9.figshare.23694504.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23694504.v1
Dataset updated
Aug 8, 2023
Dataset provided by
Taylor & Francis
Authors
Omid Mehrpour; Christopher Hoyte; Abdullah Al Masud; Ashis Biswas; Jonathan Schimmel; Samaneh Nakhaee; Mohammad Sadegh Nasr; Heather Delva-Clark; Foster Goss
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Acute poisoning is a significant global health burden, and the causative agent is often unclear. The primary aim of this pilot study was to develop a deep learning algorithm that predicts the most probable agent a poisoned patient was exposed to from a pre-specified list of drugs. Data were queried from the National Poison Data System (NPDS) from 2014 through 2018 for eight single-agent poisonings (acetaminophen, diphenhydramine, aspirin, calcium channel blockers, sulfonylureas, benzodiazepines, bupropion, and lithium). Two Deep Neural Networks (PyTorch and Keras) designed for multi-class classification tasks were applied. There were 201,031 single-agent poisonings included in the analysis. For distinguishing among selected poisonings, PyTorch model had specificity of 97%, accuracy of 83%, precision of 83%, recall of 83%, and a F1-score of 82%. Keras had specificity of 98%, accuracy of 83%, precision of 84%, recall of 83%, and a F1-score of 83%. The best performance was achieved in the diagnosis of single-agent poisoning in diagnosing poisoning by lithium, sulfonylureas, diphenhydramine, calcium channel blockers, then acetaminophen, in PyTorch (F1-score = 99%, 94%, 85%, 83%, and 82%, respectively) and Keras (F1-score = 99%, 94%, 86%, 82%, and 82%, respectively). Deep neural networks can potentially help in distinguishing the causative agent of acute poisoning. This study used a small list of drugs, with polysubstance ingestions excluded.Reproducible source code and results can be obtained at https://github.com/ashiskb/npds-workspace.git.
Fundus Glaucoma Detection Data [PyTorch format]
kaggle.com
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sabari (2023). Fundus Glaucoma Detection Data [PyTorch format] [Dataset]. https://www.kaggle.com/datasets/sabari50312/fundus-pytorch
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 20, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sabari
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Subset of the SMDG-19 for Glaucoma dataset in PyTorch Format

SMDG-19: https://www.kaggle.com/datasets/deathtrooper/multichannel-glaucoma-benchmark-dataset

Contains Train, Val and Test set of Fundus images for Glaucoma Detection

2 Classes (0|1)

1: Glaucoma Present 0: Glaucoma not Present
h
CIFAR10
huggingface.co
Updated Dec 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P2PFL (2024). CIFAR10 [Dataset]. https://huggingface.co/datasets/p2pfl/CIFAR10
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2024
Authors
P2PFL
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🖼️ CIFAR10 (Extracted from PyTorch Vision)

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

ℹ️ Dataset Details 📖 Dataset Description

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The classes are completely mutually exclusive. There is no… See the full description on the dataset page: https://huggingface.co/datasets/p2pfl/CIFAR10.
SimCATS_GaAs_v1_random_variations_v2
resodate.org
Updated Oct 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Hader; Fabian Fuchs; Sarah Fleitmann (2024). SimCATS_GaAs_v1_random_variations_v2 [Dataset]. http://doi.org/10.26165/JUELICH-DATA/5PB3GT
Explore at:
Unique identifier
https://doi.org/10.26165/JUELICH-DATA/5PB3GT
Dataset updated
Oct 9, 2024
Dataset provided by
Forschungszentrum Jülichhttp://www.fz-juelich.de/
Peter Grünberg Institute - Integrated Computing Architectures (ICA/PGI-4)
Authors
Fabian Hader; Fabian Fuchs; Sarah Fleitmann
Description
Dataset: SimCATS_GaAs_v1_random_variations_v2 Simulated data from the geometric SimCATS model (GitHub Repository, Paper) for benchmarking of semiconductor quantum dot tuning algorithms. Generated using this Jupyter Notebook and used for the final evaluation in Automated Charge Transition Detection in Quantum Dot Charge Stability Diagrams. Key Facts Contains pink, white & random telegraph noise, transition blurring, and dot jumps Random variations of charge transitions, sensor, and distortions 1.000 randomly sampled configurations with 100 CSDs each (in total: 100.000 CSDs) Usage To load the data, e.g. for calculating metrics, please have a look at SimCATS-Datasets (GitHub Repository, ReadTheDocs). The dataset can be loaded as numpy arrays using the function load_dataset or as PyTorch Dataset class (for machine learning purposes) using the class SimcatsDataset.
h
CrashCar
huggingface.co
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jens Parslov (2024). CrashCar [Dataset]. https://huggingface.co/datasets/JensParslov/CrashCar
Explore at:
Dataset updated
Jul 10, 2024
Authors
Jens Parslov
Description
Dataset Card for Dataset CrashCar

This is the dataset proposed in 'CrashCar101: Procedural Generation for Damage Assessment' [WACV24]

Project Page: https://crashcar.compute.dtu.dk Repository: https://github.com/JensPars/CrashCar_procedural_generation Paper: https://openaccess.thecvf.com/content/WACV2024/papers/Parslov_CrashCar101_Procedural_Generation_for_Damage_Assessment_WACV_2024_paper.pdf

Example dataset class in pytorch import os import torch from glob import glob from… See the full description on the dataset page: https://huggingface.co/datasets/JensParslov/CrashCar.

Sentence/Table Pair Data from Wikipedia for Pre-training with...

zenodo.org
data.niaid.nih.gov

application/gzip

Updated Oct 29, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun (2021). Sentence/Table Pair Data from Wikipedia for Pre-training with Distant-Supervision [Dataset]. http://doi.org/10.5281/zenodo.5612316

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5612316

Dataset updated

Oct 29, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun; Xiang Deng; Yu Su; Alyssa Lees; You Wu; Cong Yu; Huan Sun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the dataset used for pre-training in "ReasonBERT: Pre-trained to Reason with Distant Supervision", EMNLP'21.

There are two files:

sentence_pairs_for_pretrain_no_tokenization.tar.gz -> Contain only sentences as evidence, Text-only

table_pairs_for_pretrain_no_tokenization.tar.gz -> At least one piece of evidence is a table, Hybrid

The data is chunked into multiple tar files for easy loading. We use WebDataset, a PyTorch Dataset (IterableDataset) implementation providing efficient sequential/streaming data access.

For pre-training code, or if you have any questions, please check our GitHub repo https://github.com/sunlab-osu/ReasonBERT

Below is a sample code snippet to load the data

import webdataset as wds

# path to the uncompressed files, should be a directory with a set of tar files
url = './sentence_multi_pairs_for_pretrain_no_tokenization/{000000...000763}.tar'
dataset = (
  wds.Dataset(url)
  .shuffle(1000) # cache 1000 samples and shuffle
  .decode()
  .to_tuple("json")
  .batched(20) # group every 20 examples into a batch
)

# Please see the documentation for WebDataset for more details about how to use it as dataloader for Pytorch
# You can also iterate through all examples and dump them with your preferred data format

Below we show how the data is organized with two examples.

Text-only

{'s1_text': 'Sils is a municipality in the comarca of Selva, in Catalonia, Spain.', # query sentence
 's1_all_links': {
  'Sils,_Girona': [[0, 4]],
  'municipality': [[10, 22]],
  'Comarques_of_Catalonia': [[30, 37]],
  'Selva': [[41, 46]],
  'Catalonia': [[51, 60]]
 }, # list of entities and their mentions in the sentence (start, end location)
 'pairs': [ # other sentences that share common entity pair with the query, group by shared entity pairs
  {
    'pair': ['Comarques_of_Catalonia', 'Selva'], # the common entity pair
    's1_pair_locs': [[[30, 37]], [[41, 46]]], # mention of the entity pair in the query
    's2s': [ # list of other sentences that contain the common entity pair, or evidence
     {
       'md5': '2777e32bddd6ec414f0bc7a0b7fea331',
       'text': 'Selva is a coastal comarque (county) in Catalonia, Spain, located between the mountain range known as the Serralada Transversal or Puigsacalm and the Costa Brava (part of the Mediterranean coast). Unusually, it is divided between the provinces of Girona and Barcelona, with Fogars de la Selva being part of Barcelona province and all other municipalities falling inside Girona province. Also unusually, its capital, Santa Coloma de Farners, is no longer among its larger municipalities, with the coastal towns of Blanes and Lloret de Mar having far surpassed it in size.',
       's_loc': [0, 27], # in addition to the sentence containing the common entity pair, we also keep its surrounding context. 's_loc' is the start/end location of the actual evidence sentence
       'pair_locs': [ # mentions of the entity pair in the evidence
        [[19, 27]], # mentions of entity 1
        [[0, 5], [288, 293]] # mentions of entity 2
       ],
       'all_links': {
        'Selva': [[0, 5], [288, 293]],
        'Comarques_of_Catalonia': [[19, 27]],
        'Catalonia': [[40, 49]]
       }
      }
    ,...] # there are multiple evidence sentences
   },
 ,...] # there are multiple entity pairs in the query
}

Hybrid

{'s1_text': 'The 2006 Major League Baseball All-Star Game was the 77th playing of the midseason exhibition baseball game between the all-stars of the American League (AL) and National League (NL), the two leagues comprising Major League Baseball.',
 's1_all_links': {...}, # same as text-only
 'sentence_pairs': [{'pair': ..., 's1_pair_locs': ..., 's2s': [...]}], # same as text-only
 'table_pairs': [
  'tid': 'Major_League_Baseball-1',
  'text':[
    ['World Series Records', 'World Series Records', ...],
    ['Team', 'Number of Series won', ...],
    ['St. Louis Cardinals (NL)', '11', ...],
  ...] # table content, list of rows
  'index':[
    [[0, 0], [0, 1], ...],
    [[1, 0], [1, 1], ...],
  ...] # index of each cell [row_id, col_id]. we keep only a table snippet, but the index here is from the original table.
  'value_ranks':[
    [0, 0, ...],
    [0, 0, ...],
    [0, 10, ...],
  ...] # if the cell contain numeric value/date, this is its rank ordered from small to large, follow TAPAS
  'value_inv_ranks': [], # inverse rank
  'all_links':{
    'St._Louis_Cardinals': {
     '2': [
      [[2, 0], [0, 19]], # [[row_id, col_id], [start, end]]
     ] # list of mentions in the second row, the key is row_id
    },
    'CARDINAL:11': {'2': [[[2, 1], [0, 2]]], '8': [[[8, 3], [0, 2]]]},
  }
  'name': '', # table name, if exists
  'pairs': {
    'pair': ['American_League', 'National_League'],
    's1_pair_locs': [[[137, 152]], [[162, 177]]], # mention in the query
    'table_pair_locs': {
     '17': [ # mention of entity pair in row 17
       [
        [[17, 0], [3, 18]],
        [[17, 1], [3, 18]],
        [[17, 2], [3, 18]],
        [[17, 3], [3, 18]]
       ], # mention of the first entity
       [
        [[17, 0], [21, 36]],
        [[17, 1], [21, 36]],
       ] # mention of the second entity
     ]
    }
   }
 ]
}

FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1)
zenodo.org
bin, png, zip
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christoph Balada; Christoph Balada; Max Bondorf; Sheraz Ahmed; Andreas Dengel; Andreas Dengel; Markus Zdrallek; Max Bondorf; Sheraz Ahmed; Markus Zdrallek (2024). FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1) [Dataset]. http://doi.org/10.5281/zenodo.8328113
Explore at:
bin, zip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8328113
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christoph Balada; Christoph Balada; Max Bondorf; Sheraz Ahmed; Andreas Dengel; Andreas Dengel; Markus Zdrallek; Max Bondorf; Sheraz Ahmed; Markus Zdrallek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# FiN-2 Large-Scale Real-World PLC-Dataset

## About
#### FiN-2 dataset in a nutshell:
FiN-2 is the first large-scale real-world dataset on data collected in a powerline communication infrastructure. Since the electricity grid is inherently a graph, our dataset could be interpreted as a graph dataset. Therefore, we use the word node to describe points (cable distribution cabinets) of measurement within the low-voltage electricity grid and the word edge to describe connections (cables) in between them. However, since these are PLC connections, an edge does not necessarily have to correspond to a real cable; more on this in our paper.
FiN-2 shows measurements that relate to the nodes (voltage, total harmonic distortion) as well as to the edges (signal-to-noise ratio spectrum, tonemap). In total, FiN-2 is distributed across three different sites with a total of 1,930,762,116 node measurements each for the individual features and 638,394,025 edge measurements each for all 917 PLC channels. All data was collected over a 25-month period from mid-2020 to the end of 2022.
We propose this dataset to foster research in the domain of grid automation and smart grid. Therefore, we provide different example use cases in asset management, grid state visualization, forecasting, predictive maintenance, and novelty detection. For more decent information on this dataset, please see our [paper](https://arxiv.org/abs/2209.12693).

* * *
## Content
FiN-2 dataset splits up into two compressed `csv-Files`: *nodes.csv* and *edges.csv*.

All files are provided as a compressed ZIP file and are divided into four parts. The first part can be found in this repo, while the remaining parts can be found in the following:
- https://zenodo.org/record/8328105
- https://zenodo.org/record/8328108
- https://zenodo.org/record/8328111

### Node data

| id | ts | v1 | v2 | v3 | thd1 | thd2 | thd3 | phase_angle1 | phase_angle2 | phase_angle3 | temp |
|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
|112|1605530460|236.5|236.4|236.0|2.9|2.5|2.4|120.0|119.8|120.0|35.3|
|112|1605530520|236.9|236.6|236.6|3.1|2.7|2.5|120.1|119.8|120.0|35.3|
|112|1605530580|236.2|236.4|236.0|3.1|2.7|2.5|120.0|120.0|119.9|35.5|

- id / ts: Unique identifier of the node that is measured and timestemp of the measurement
- v1/v2/v3: Voltage measurements of all three phases
- thd1/thd2/thd3: Total harmonic distortion of all three phases
- phase_angle1/2/3: Phase angle of all three phases
- temp: Temperature in-circuit of the sensor inside a cable distribution unit (in °C)

### Edge data
| src | dst | ts | snr0 | snr1 | snr2 | ... | snr916 |
|----|----|----|----|----|----|----|----|
|62|94|1605528900|70|72|45|...|-53|
|62|32|1605529800|16|24|13|...|-51|
|17|94|1605530700|37|25|24|...|-55|

- src & dst & ts: Unique identifier of the source and target nodes where the spectrum is measured and time of measurement
- snr0/snr1/.../snr916: 917 SNR measurements in tenths of a decibel (e.g. 50 --> 5dB).

### Metadata
Metadata that is provided along with the data covers:

- Number of cable joints
- Cable properties (length, type, number of sections)
- Relative position of the nodes (location, zero-centered gps)
- Adjacent PV or wallbox installations
- Year of installation w.r.t. the nodes and cables

Since the electricity grid is part of the critical infrastructure, it is not possible to provide exact GPS locations.

* * *
## Usage
Simple data access using pandas:

```
import pandas as pd

nodes_file = "nodes.csv.gz" # /path/to/nodes.csv.gz
edges_file = "edges.csv.gz" # /path/to/edges.csv.gz

# read the first 10 rows
data = pd.read_csv(nodes_file, nrows=10, compression='gzip')

# read the row number 5 to 15
data = pd.read_csv(nodes_file, nrows=10, skiprows=[i for i in range(1,6)], compression='gzip')

# ... same for the edges
```

Compressed csv-data format was used to make sharing as easy as possible, however it comes with significant drawbacks for machine learning. Due to the inherent graph structure, a single snapshot of the whole graph consists of a set of node and edge measurements. But due to timeouts, noise and other disturbances, nodes sometimes fail in collecting the data, wherefore the number of measurements for a specific timestamp differs. This, plus the high sparsity of the graph, leads to a high inefficiency when using the csv-format for an ML training.
To utilize the data in an ML pipeline, we recommend other data formats like [datadings](https://datadings.readthedocs.io/en/latest/) or specialized database solutions like [VictoriaMetrics](https://victoriametrics.com/).

### Example use case (voltage forecasting)

Forecasting of the voltage is one potential use cases. The Jupyter notebook provided in the repository gives an overview of how the dataset can be loaded, preprocessed and used for ML training. Thereby, a MinMax scaling was used as simple preprocessing and a PyTorch dataset class was created to handle the data. Furthermore, a vanilla autoencoder is utilized to process and forecast the voltage into the future.
Z
Data from: Self-Supervised Representation Learning on Neural Network Weights...
data.niaid.nih.gov
Updated Nov 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian (2021). Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction - Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5645137
Explore at:
Dataset updated
Nov 13, 2021
Dataset provided by
University of St.Gallen
Authors
Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".

Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.

Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)

Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.
GISE-51
zenodo.org
application/gzip, txt
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarthak Yadav; Sarthak Yadav; Mary Ellen Foster; Mary Ellen Foster (2021). GISE-51 [Dataset]. http://doi.org/10.5281/zenodo.4593514
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4593514
Dataset updated
Apr 13, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sarthak Yadav; Sarthak Yadav; Mary Ellen Foster; Mary Ellen Foster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at https://github.com/SarthakYadav/GISE-51-pytorch.

Citation

If you use the GISE-51 dataset and/or the released code, please cite our paper:

Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021

Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper:

Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.

About GISE-51 and GISE-51-Mixtures

The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper.

GISE-51

Three subsets: train, val and eval with 12465, 1716, and2176 utterances. Subsets are in coherence with the FSD50K release.

Encompasses 51 sound classes from the FSD50K release

View meta/lbl_map.csv for the complete vocabulary.

The dataset was obtained from FSD50K using the following steps:

Unsmearing annotations to obtain single instances with a single label using the provided metadata and ground truth in FSD50K.

Manual inspection to qualitatively evaluate shortlisted utterances.

Volume-threshold based automated silence filtering using sox. Different volume thresholds are selected for various sound event class bins using trial-and-error. silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/strip_silence_sox.py in the code repository.

Re-evaluate sound event classes, removing ones with too few samples and merging those with high inter-class ambiguity.

GISE-51-Mixtures

Synthetic 5-second soundscapes with up to 3 events created using Scaper.

Weighted sampling with replacement for sound event selection, effectively oversampling events with very few samples. Synthetic soundscapes generated thus have a near equal number of annotations per sound event.

The number of soundscapes in val and eval set is 10000 each.

The number of soundscapes in the final train set is 60000. We do provide training sets with 5k-100k soundscapes.

GISE-51-Mixtures is our proposed subset that can be used to benchmark the performance of future works.

LICENSE

All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License.

GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz.

Baselines

Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at https://github.com/SarthakYadav/GISE-51-pytorch.

Files

GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz. Following lists the contents of these files in detail:

isolated_events.tar.gz: The core GISE-51 isolated events dataset containing train, val and eval subfolders.

meta.tar.gz: contains lbl_map.json

noises.tar.gz: contains background noises used for GISE-51-Mixtures soundscape generation

mixtures_jams.tar.gz: This file contains annotation files in .jams format that, alongside isolated_events.tar.gz and noises.tar.gz can be reused to generate exact GISE-51-Mixtures soundscapes. (Optional, we provide the complete set of GISE-51-Mixtures soundscapes as independent tar archives.)

train.tar.gz: GISE-51-Mixtures train set, containing 60k synthetic soundscapes.

val.tar.gz: GISE-51-Mixtures val set, containing 10k synthetic soundscapes.

eval.tar.gz: GISE-51-Mixtures eval set, containing 10k synthetic soundscapes.

train_*.tar.gz: These are tar archives containing training mixtures of a various number of soundscapes, used primarily in Section 4.1 of the paper, which compares val mAP performance v/s number of training soundscapes. A helper script is provided in the code release, prepare_mixtures_lmdb.sh, to prepare data for experiments in Section 4.1.

pretrained-models.tar.gz: Contains model checkpoints for all experiments conducted in the paper. More information on these checkpoints can be found in the code release README.

experiments_60k_mixtures: model checkpoints from section 4.2 of the paper.

exported_weights_60k: ResNet-18 and EfficientNet-B1 exported as plain state_dicts for use with transfer learning experiments.

experiments_audioset: checkpoints from AudioSet Balanced (Sec 4.3.1) experiments

experiments_vggsound: checkpoints from Section 4.3.2 of the paper

experiments_esc50: ESC-50 dataset checkpoints, from Section 4.3.3

license.tar.gz: contains dataset license info.

silence_thresholds.txt: contains volume thresholds for various sound event bins used for silence filtering.

Contact

In case of queries and clarifications, feel free to contact Sarthak at s.yadav.2@research.gla.ac.uk. (Adding [GISE-51] to the subject of the email would be appreciated!)

Replication package for the paper "Do Comments follow Commenting...

zenodo.org

zip

Updated Aug 28, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Pooja; Pooja (2021). Replication package for the paper "Do Comments follow Commenting Conventions? A case study in Java and Python" [Dataset]. http://doi.org/10.5281/zenodo.5296443

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.5296443

Dataset updated

Aug 28, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Pooja; Pooja

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# RP-comment-convention-adherence-Java-Python
Replication Package for the paper "Do Comments follow Commenting Conventions? A case study in Java and Python".
It uses the dataset provided by Rani et.al.'s work [How to identify class comment types? A multi-language approach for class
 comment classification](https://github.com/poojaruhal/RP-class-comment-classification).


## Structure
```
RQ1/
  RQ1_Java_Rules.xlsx
  RQ1_Python_Rules.xlsx

RQ2/
  RQ1_Java_Comments_Validated.xlsx
  RQ1_Python_Comments_Validated.xlsx
  Raw-projects/
  Java_projects/
    eclipse.zip
    guava.zip
    guice.zip
    hadoop.zip
    spark.zip
    vaadin.zip
  Python_projects/
    django.zip
    ipython.zip
    Mailpile.zip
    pandas.zip
    pipenv.zip
    pytorch.zip   
    requests.zip
Style-guides
```

## Contents of the Replication Package
---

- **RQ1/** - contains the data used to answer RQ1
  - `RQ1_Java_Rules.xlsx` - contains comment-related rules extracted from various Java style guidelines. Various tabs in the sheet represent the rules extracted from standard or project-specific guidelines.
  Oracle and Google are the standard guidelines, and the remaining are specific to the projects.
  - `RQ1_Python_Rules.xlsx` - contains comment-related rules extracted from various Python style guidelines. Various tabs in the sheet represent the rules extracted from standard or project-specific guidelines. PEP, Numpy, and Google are the standard guidelines and the remaining are specific to the projects.

- **RQ2/** - contains the data used to answer RQ2
  - `RQ2_Java_Comments_Validated.xlsx` - contains Java comment dataset used from the previous work and validated against the rules from their corresponding guidelines. Various tabs in the sheet represent various Java projects used in the work. The rows in each tab show the sample class comments used to validate against the rules. The rules are shown in the columns.
  - `RQ2_Python_Comments_Validated.xlsx` - contains Python comment dataset used from the previous work and validated against the rules from their corresponding guidelines. Various tabs in the sheet represent various Java projects used in the work. The rows in each tab show the sample class comments used to validate against the rules. The rules are shown in the columns.  
  - **Raw-projects/** contains the raw projects of each language that are used to analyze class comments.
    - **Java_projects/**
      - `eclipse.zip` - Eclipse project downloaded from the GitHub. More detail about the project is on https://github.com/eclipse
      - `guava.zip` - Guava project downloaded from the GitHub. More detail about the project is on https://github.com/google/guava
      - `guice.zip` - Guice project downloaded from the GitHub. More detail about the project is on https://github.com/google/guice
      - `hadoop.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is on https://github.com/apache/hadoop
      - `spark.zip` - Apache Hadoop project downloaded from the GitHub. More detail about the project is on https://github.com/apache/spark
      - `vaadin.zip` - Vaadin project downloaded from the GitHub. More detail about the project is on https://github.com/vaadin/framework

    - **Python_projects/**
      - `django.zip` - Django project downloaded from the GitHub. More detail about the project is on https://github.com/django. 
      - `ipython.zip` - IPython project downloaded from the GitHub. More detail about the project is on https://github.com/ipython/ipython
      - `Mailpile.zip` - Mailpile project downloaded from the GitHub. More detail about the project is on https://github.com/mailpile/Mailpile
      - `pandas.zip` - pandas project downloaded from the GitHub. More detail about the project is on https://github.com/pandas-dev/pandas
      - `pipenv.zip` - Pipenv project downloaded from the GitHub. More detail about the project is on https://github.com/pypa/pipenv
      - `pytorch.zip` - PyTorch project downloaded from the GitHub. More detail about the project is on https://github.com/pytorch/pytorch
      - `requests.zip` - Requests project downloaded from the GitHub. More detail about the project is on https://github.com/psf/requests/
- **Style-guides/**- contains the style guidelines used for the selected projects.
---

h
Changen2-S1-15k
huggingface.co
Updated Oct 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhuo Zheng (2024). Changen2-S1-15k [Dataset]. https://huggingface.co/datasets/EVER-Z/Changen2-S1-15k
Explore at:
Dataset updated
Oct 16, 2024
Authors
Zhuo Zheng
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for Changen2-S1-15k

Changen2-S1-15k (a building change dataset with 15k pairs and 2 change types), 0.3-1m spatial resolution, RGB bands

Dataset Sources

Repository: https://github.com/Z-Zheng/pytorch-change-models Paper: https://ieeexplore.ieee.org/document/10713915

Citation

BibTeX: @article{zheng_changen2, author={Zheng, Zhuo and Ermon, Stefano and Kim, Dongjun and Zhang, Liangpei and Zhong, Yanfei}, journal={IEEE Transactions on Pattern… See the full description on the dataset page: https://huggingface.co/datasets/EVER-Z/Changen2-S1-15k.
Z
3DO Dataset | On the Generalization of WiFi-based Person-centric Sensing in...
nde-dev.biothings.io
data.niaid.nih.gov
+1more
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Strohmayer, Julian (2024). 3DO Dataset | On the Generalization of WiFi-based Person-centric Sensing in Through-Wall Scenarios [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_10925350
Explore at:
Dataset updated
Dec 5, 2024
Dataset provided by
Strohmayer, Julian
Kampel, Martin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
On the Generalization of WiFi-based Person-centric Sensing in Through-Wall Scenarios

This repository contains the 3DO dataset proposed in [1].

PyTroch Dataloader

A minimal PyTorch dataloader for the 3DO dataset is provided at: https://github.com/StrohmayerJ/3DO

Dataset Description

The 3DO dataset comprises 42 five-minute recordings (~1.25M WiFi packets) of three human activities performed by a single person, captured in a WiFi through-wall sensing scenario over three consecutive days. Each WiFi packet is annotated with a 3D trajectory label and a class label for the activities: no person/background (0), walking (1), sitting (2), and lying (3). (Note: The labels returned in our dataloader example are walking (0), sitting (1), and lying (2), because background sequences are not used.)

The directories 3DO/d1/, 3DO/d2/, and 3DO/d3/ contain the sequences from days 1, 2, and 3, respectively. Furthermore, each sequence directory (e.g., 3DO/d1/w1/) contains a csiposreg.csv file storing the raw WiFi packet time series and a csiposreg_complex.npy cache file, which stores the complex Channel State Information (CSI) of the WiFi packet time series. (If missing, csiposreg_complex.npy is automatically generated by the provided dataloader.)

Dataset Structure:

/3DO

├── d1 <-- day 1 subdirectory

└── w1 <-- sequence subdirectory └── csiposreg.csv <-- raw WiFi packet time series └── csiposreg_complex.npy <-- CSI time series cache

├── d2 <-- day 2 subdirectory

├── d3 <-- day 3 subdirectory

In [1], we use the following training, validation, and test split:

Subset Day Sequences

Train 1 w1, w2, w3, s1, s2, s3, l1, l2, l3

Val 1 w4, s4, l4

Test 1 w5 , s5, l5

Test 2 w1, w2, w3, w4, w5, s1, s2, s3, s4, s5, l1, l2, l3, l4, l5

Test 3 w1, w2, w4, w5, s1, s2, s3, s4, s5, l1, l2, l4

w = walking, s = sitting and l= lying

Note: On each day, we additionally recorded three ten-minute background sequences (b1, b2, b3), which are provided as well.

Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].

[1] Strohmayer, J., Kampel, M. (2025). On the Generalization of WiFi-Based Person-Centric Sensing in Through-Wall Scenarios. In: Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15315. Springer, Cham. https://doi.org/10.1007/978-3-031-78354-8_13

BibTeX citation:

@inproceedings{strohmayerOn2025, author="Strohmayer, Julian and Kampel, Martin", title="On the Generalization of WiFi-Based Person-Centric Sensing in Through-Wall Scenarios", booktitle="Pattern Recognition", year="2025", publisher="Springer Nature Switzerland", address="Cham", pages="194--211", isbn="978-3-031-78354-8" }

Facebook

Twitter

Click to copy link

Link copied

Cite

Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST

Explore at:

Dataset updated

Jun 13, 2022

Dataset provided by

Image Processing Group, Universitat Politècnica de Catalunya
AI Lab Montreal, Samsung Advanced Institute of Technology
AIML Lab, University of St.Gallen

Authors

Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Clear search

Close search

Google apps

Main menu

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

Pytorch Models

Oxford 102 Flower Dataset

Overview

Directory Structure

Visualization of the dataset

Publications

Source

bigearthnet

cifar-100-python

CIFAR-100 Dataset

1. Overview

2. Dataset Details

3. Fine and Coarse Labels

4. Applications

5. Challenges

6. File Format

7. Example Classes

BIRD: Big Impulse Response Dataset

Dataset for class comment analysis

Structure

Contents of the Replication Package

feral-cat-segmentation_dataset

Feral Cat Segmentation Dataset

Overview

Dataset Source

Dataset Contents

Data Formats

1. Image Files

2. Parquet Files

3. Pickle Files

4. CSV Files

Image Preprocessing

Data Normalization

PyTorch Integration

Performance Comparison

Citation

Sample Usage Code

Basic Dataset Loading

Data from: Deep learning neural network derivation and testing to...

Fundus Glaucoma Detection Data [PyTorch format]

CIFAR10

SimCATS_GaAs_v1_random_variations_v2

CrashCar

Sentence/Table Pair Data from Wikipedia for Pre-training with...

FiN-2: Larg-Scale Powerline Communication Dataset (Pt.1)

Data from: Self-Supervised Representation Learning on Neural Network Weights...

GISE-51

Replication package for the paper "Do Comments follow Commenting...

Changen2-S1-15k

3DO Dataset | On the Generalization of WiFi-based Person-centric Sensing in...

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNISTSee More Versions

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST